An Eﬃcient Subsumption Test Pipeline for BS(LRA) Clauses

. The importance of subsumption testing for redundancy elimination in ﬁrst-order logic automatic reasoning is well-known. Although the problem is already NP-complete for ﬁrst-order clauses, the mean-while developed test pipelines eﬃciently decide subsumption in almost all practical cases. We consider subsumption between ﬁrst-oder clauses of the Bernays-Sch¨onﬁnkel fragment over linear real arithmetic constraints: BS(LRA). The bottleneck in this setup is deciding implication between the LRA constraints of two clauses. Our new sample point heuristic pre-empts expensive implication decisions in about 94% of all cases in benchmarks. Combined with ﬁltering techniques for the ﬁrst-order BS part of clauses, it results again in an eﬃcient subsumption test pipeline for BS(LRA) clauses.


Introduction
The elimination of redundant clauses is crucial for the efficient automatic reasoning in first-order logic. In a resolution [5,50] or superposition setting [4,44], a newly inferred clause might be subsumed by a clause that is already known (forward subsumption) or it might subsume a known clause (backward subsumption). Although the SCL calculi family [1,11,21] does not require forward subsumption tests, a property also inherent to the propositional CDCL (Conflict Driven Clause Learning) approach [8,34,41,55,63], backward subsumption and hence subsumption remains an important test in order to remove redundant clauses.
In this work we present advances in deciding subsumption for constrained clauses, specifically employing the Bernays-Schönfinkel fragment as foreground logic, and linear real arithmetic as background theory, BS(LRA). BS(LRA) is of particular interest because it can be used to model supervisors, i.e., components in technical systems that control system functionality. An example for a supervisor is the electronic control unit of a combustion engine. The logics we use to model supervisors and their properties are called SupERLogs-(Sup)ervisor (E)ffective(R)easoning (Log)ics. SupERLogs are instances of function-free firstorder logic extended with arithmetic [18], which means BS(LRA) is an example of a SupERLog.
Subsumption is an important redundancy criterion in the context of hierarchic clausal reasoning [6,11,20,35,37]. At the heart of this paper is a new technique to speed up the treatment of linear arithmetic constraints as part of deciding subsumption. For every clause, we store a solution of its associated constraints, which is used to quickly falsify implication decisions, acting as a filter, called the sample point heuristic. In our experiments with various benchmarks, the technique is very effective: It successfully preempts expensive implication decisions in about 94% of cases. We elaborate on these findings in Sect. 4.
For example, consider three BS clauses, none of which subsumes another: Let C 4 be the resolvent of C 1 and C 2 upon the atom P (a, x), i.e., C 4 := Q(a, z, b). Now C 4 backward-subsumes C 3 with matcher σ := {z → x}, i.e. C 4 σ ⊂ C 3 , thus C 3 is redundant and can be eliminated. Now, consider an extension of the above clauses with some simple LRA constraints following the same reasoning: where is interpreted as an implication, i.e., clause C 1 stands for ¬x ≥ 1∨P (a, x) or simply x < 1 ∨ P (a, x). The respective resolvent on the constrained clauses is C 4 := z ≥ 0, z ≥ 1 Q(a, z, b) or after constraint simplification C 4 := z ≥ 1 Q(a, z, b) because z ≥ 1 implies z ≥ 0. For the constrained clauses, C 4 does no longer subsume C 3 with matcher σ := {z → x}, because z ≥ 0 does not LRA-imply z ≥ 1. Now, if we store the sample point x = 0 as a solution for the constraint of clause C 3 , this sample point already reveals that z ≥ 0 does not LRA-imply z ≥ 1. This constitutes the basic idea behind our sample point heuristic. In general, constraints are not just simple bounds as in the above example, and sample points are solutions to the system of linear inequalities of the LRA constraint of a clause. Please note that our test on LRA constraints is based on LRA theory implication and not on a syntactic notion such as subsumption on the first-order part of the clause. In this sense it is "stronger" than its first-order counterpart. This fact is stressed by the following example, taken from [26,Ex. 2], which shows that first-order implication does not imply subsumption. Let Then we have C 1 → C 2 , but again, for all σ we have C 1 σ ⊆ C 2 : Constructing σ from left to right we obtain σ := {x → a, y → b, z → c}, but P (a, c) ∈ C 2 .
Related Work. Treatment of questions regarding the complexity of deciding subsumption of first-order clauses [27] dates back more than thirty years. Notions of subsumption, varying in generality, are studied in different sub-fields of theorem proving, whereas we restrict our attention to first-order theorem proving. Modern implementations typically decide multiple thousand instances of this problem per second: In [62,Sect. 2], Voronkov states that initial versions of Vampire "seemed to [. . . ] deadlock" without efficient implementations to decide (forward) subsumption.
In order to reduce the number of clauses out of a set of clauses to be considered for pairwise subsumption checking, the best known practice in firstorder theorem proving is to use (imperfect) indexing data structures as a means for pre-filtering and research concerning appropriate techniques is plentiful, see [24,25,[27][28][29][30]33,39,40,43,[45][46][47][48][49][52][53][54]56,59,61] for an evaluation of these techniques. Here we concentrate on the efficiency of a subsumption check between two clauses and therefore do not take indexing techniques into account. Furthermore, the implication test between two linear arithmetic constraints is of a semantic nature and is not related to any syntactic features of the involved constraints and can therefore hardly be filtered by a syntactic indexing approach.
In addition to pre-filtering via indexing, almost all above mentioned implementations of first-order subsumption tests rely on additional filters on the clause level. The idea is to generate an abstraction of clauses together with an ordering relation such that the ordering relation is necessary to hold between two clauses in order for one clause to subsume the other. Furthermore, the abstraction as well as the ordering relation should be efficiently computable. For example, a necessary condition for a first-order clause C 1 to subsume a first-order clause C 2 is | vars(C 1 )| ≥ | vars(C 2 )|, i.e., the number of different variables in C 1 must be larger or equal than the number of variables in C 2 . Further and additional abstractions included by various implementations rely on the size of clauses, number of ground literals, depth of literals and terms, occurring predicate and function symbols. For the BS(LRA) clauses considered here, the structure of the first-order BS part, which consists of predicates and flat terms (variables and constants) only, is not particularly rich.
The exploration of sample points has already been studied in the context of first-order clauses with arithmetic constraints. In [17,36] it was used to improve the performance of iSAT [23] on testing non-linear arithmetic constraints. In general, iSAT tests satisfiability by interval propagation for variables. If intervals get "too small" it typically gives up, however sometimes the explicit generation of a sample point for a small interval can still lead to a certificate for satisfiability. This technique was successfully applied in [17], but was not used for deciding subsumption of constrained clauses.
Motivation. The main motivation for this work is the realization that computing implication decisions required to treat constraints of the background theory presents the bottleneck of an BS(LRA) subsumption check in practice. Inspired by the success of filtering techniques in first-order logic, we devise an exceptionally effective filter for constraints and adopt well-known first-order filters to the BS fragment. Our sample point heuristic for LRA could easily be generalized to other arithmetic theories as well as full first-order logic.
Structure. The paper is structured as follows. After a section defining BS(LRA) and common notions and notation, Sect. 2, we define redundancy notions and our sample point heuristic in Sect. 3. Section 4 justifies the success of the sample point heuristic by numerous experiments in various application domains of BS(LRA). The paper ends with a discussion of the obtained results, Sect. 5. Binaries, utility scripts, benchmarking instances used as input, and the output used for evaluation may be obtained online [13].

Preliminaries
We briefly recall the basic logical formalisms and notations we build upon [10]. Our starting point is a standard many-sorted first-order language for BS with constants (denoted a, b, c), without non-constant function symbols, with variables (denoted w, x, y, z), and predicates (denoted P, Q, R) of some fixed arity. Terms (denoted t, s) are variables or constants. An atom (denoted A, B) is an expression P (t 1 , . . . , t n ) for a predicate P of arity n. A positive literal is an atom A and a negative literal is a negated atom ¬A. We define comp(A) = ¬A, comp(¬A) = A, |A| = A and |¬A| = A. Literals are usually denoted L, K, H. Formulas are defined in the usual way using quantifiers ∀, ∃ and the boolean connectives ¬, ∨, ∧, →, and ≡.
A clause (denoted C, D) is a universally closed disjunction of literals A 1 ∨· · ·∨ A n ∨¬B 1 ∨· · ·∨¬B m . Clauses are identified with their respective multisets and all standard multiset operations are extended to clauses. For instance, C ⊆ D means that all literals in C also appear in D respecting their number of occurrences. A clause is Horn if it contains at most one positive literal, i.e. n 1, and a unit clause if it has exactly one literal, i.e. n + m = 1. We write C + for the set of positive literals, or conclusions of C, i.e. C + := {A 1 , . . . , A n } and respectively C − for the set of negative literals, or premises of C, i.e.
If Y is a term, formula, or a set thereof, vars(Y ) denotes the set of all variables in Y , and Y is ground if vars(Y ) = ∅.
The Bernays-Schönfinkel Clause Fragment (BS) in first-order logic consists of first-order clauses where all involved terms are either variables or constants. The Horn Bernays-Schönfinkel Clause Fragment (HBS) consists of all sets of BS Horn clauses.
A substitution σ is a function from variables to terms with a finite domain dom(σ) = {x | xσ = x} and codomain codom(σ) = {xσ | x ∈ dom(σ)}. We denote substitutions by σ, δ, ρ. The application of substitutions is often written postfix, as in xσ, and is homomorphically extended to terms, atoms, literals, clauses, and quantifier-free formulas. A substitution σ is ground if codom(σ) is ground. Let Y denote some term, literal, clause, or clause set. A substitution σ is a grounding for Y if Y σ is ground, and Y σ is a ground instance of Y in this case. We denote by gnd(Y ) the set of all ground instances of Y , and by gnd B (Y ) the set of all ground instances over a given set of constants B. The most general unifier mgu(Z 1 , Z 2 ) of two terms/atoms/literals Z 1 and Z 2 is defined as usual, and we assume that it does not introduce fresh variables and is idempotent.
We assume a standard many-sorted first-order logic model theory, and write A φ if an interpretation A satisfies a first-order formula φ. A formula ψ is a logical consequence of φ, written φ ψ, if A ψ for all A such that A φ. Sets of clauses are semantically treated as conjunctions of clauses with all variables quantified universally.

Bernays-Schönfinkel with Linear Real Arithmetic
The extension of BS with linear real arithmetic, BS(LRA), is the basis for the formalisms studied in this paper. We consider a standard many-sorted firstorder logic with one first-order sort F and with the sort R for the real numbers. Given a clause set N , the interpretations A of our sorts are fixed: R A = R and F A = F. This means that F A is a Herbrand interpretation, i.e., F is the set of first-order constants in N , or a single constant out of the signature if no such constant occurs. Note that this is not a deviation from standard semantics in our context as for the arithmetic part the canonical domain is considered and the first-order sort has the finite model property over the occurring constants (note that equality is not part of BS).
Constant symbols, arithmetic function symbols, variables, and predicates are uniquely declared together with their respective sort. The unique sort of a constant symbol, variable, predicate, or term is denoted by the function sort(Y ) and we assume all terms, atoms, and formulas to be well-sorted. We assume pure input clause sets, which means the only constants of sort R are (rational) numbers. This means the only constants that we do allow are rational numbers c ∈ Q and the constants defining our finite first-order sort F. Irrational numbers are not allowed by the standard definition of the theory. The current implementation comes with the caveat that only integer constants can be parsed. Satisfiability of pure BS(LRA) clause sets is semi-decidable, e.g., using hierarchic superposition [6] or SCL(T) [11]. Impure BS(LRA) is no longer compact and satisfiability becomes undecidable, but its restriction to ground clause sets is decidable [22].
All arithmetic predicates and functions are interpreted in the usual way. An interpretation of BS(LRA) coincides with A LRA on arithmetic predicates and functions, and freely interprets free predicates. For pure clause sets this is well-defined [6]. Logical satisfaction and entailment is defined as usual, and uses similar notation as for BS.
is part of a timed automaton with two clocks x and y modeled in BS(LRA). It represents a transition from state S 0 to state S 1 that can be traversed only if clock y is at least 5 and that resets y to 0 and increases x by 1.
Arithmetic terms are constructed from a set X of variables, the set of integer constants c ∈ Z, and binary function symbols + and − (written infix). Additionally, we allow multiplication · if one of the factors is an integer constant. Multiplication only serves us as syntactic sugar to abbreviate other arithmetic terms, e.g., x + x + x is abbreviated to 3 · x. Atoms in BS(LRA) are either first-order atoms (e.g., P (13, x)) or (linear) arithmetic atoms (e.g., x < 42). Arithmetic atoms are denoted by λ and may use the predicates ≤, <, =, =, >, ≥, which are written infix and have the expected fixed interpretation. We use as a placeholder for any of these predicates. Predicates used in first-order atoms are called free. First-order literals and related notation is defined as before. Arithmetic literals coincide with arithmetic atoms, since the arithmetic predicates are closed under negation, e.g., BS(LRA) clauses are defined as for BS but using BS(LRA) atoms. We often write clauses in the form Λ C where C is a clause solely built of free first-order literals and Λ is a multiset of LRA atoms called the constraint of the clause. A clause of the form Λ C is therefore also called a constrained clause. The semantics of Λ C is as follows: Note that since the neutral element of conjunction is , an empty constraint is thus valid, i.e. equivalent to true. An assignment for a constraint Λ is a substitution (denoted β) that maps all variables in vars(Λ) to real numbers c ∈ R. An assignment is a solution for a constraint Λ if all atoms λ ∈ (Λβ) evaluate to true. A constraint Λ is satisfiable if there exists a solution for Λ. Otherwise it is unsatisfiable. Note that assignments can be extended to C by also mapping variables of the first-order sort accordingly.
A clause or clause set is abstracted if its first-order literals contain only variables or first-order constants. Every clause C is equivalent to an abstracted clause that is obtained by replacing each non-variable arithmetic term t that occurs in a first-order atom by a fresh variable x while adding an arithmetic atom x = t to C. We assume abstracted clauses for theory development, but we prefer nonabstracted clauses in examples for readability, e.g., a unit clause P (3, 5) is considered in the development of the theory as the clause x = 3, y = 5 P (x, y). In the implementation, we mostly prefer abstracted clauses except that we allow integer constants c ∈ Z to appear as arguments of first-order literals. In some cases, this makes it easier to recognize whether two clauses can be matched or not. For instance, we see by syntactic comparison that the two unit clauses P (3, 5) and P (0, 1) have no substitution σ such that P (3, 5) = P (0, 1)σ. For the abstracted versions on the other hand, x = 3, y = 5 P (x, y) and u = 0, v = 1 P (u, v) we can find a matching substitution for the first-order part σ := {u → x, v → y} and would have to check the constraints semantically to exclude the matching.
Hierarchic Resolution. One inference rule, foundational to most algorithms for solving constrained first-order clauses, is hierarchic resolution [6]: The conclusion is called hierarchic resolvent (of the two clauses in the premise). A refutation is the sequence of resolution steps that produces a clause Λ ⊥ with A LRA Λδ for some grounding δ. Hierarchic resolution is sound and refutationally complete for the BS(LRA) clauses considered here, since every set N of BS(LRA) clauses is sufficiently complete [6], because all constatnts of the arithemtic sort are numbers. Hence hierarchic resolution is sound and refutationally complete for N [6,7]. Hierarchic unit resolution is a special case of hierarchic resolution, that only combines two clauses in case one of them is a unit clause. Hierarchic unit resolution is sound and complete for HBS(LRA) [6,7], but not even refutationally complete for BS(LRA).
Most algorithms for Bernays-Schnönfinkel, first-order logic, and beyond utilize resolution. The SCL(T) calculus for HBS(LRA) uses hierarchic resolution in order to learn from the conflicts it encounters during its search. The hierarchic superposition calculus on the other hand derives new clauses via hierarchic resolution based on an ordering. The goal is to either derive the empty clause or a saturation of the clause set, i.e., a state from which no new clauses can be derived. Each of those algorithms must derive new clauses in order to progress, but their subroutines also get progressively slower as more clauses are derived. In order to increase efficiency, it is necessary to eliminate clauses that are obsolete. One measure that determines whether a clause is useful or not is redundancy.
Redundancy. In order to define redundancy for constrained clauses, we need an H-order, i.e., a well-founded, total, strict ordering ≺ on ground literals such that literals in the constraints (in our case arithmetic literals) are always smaller than first-order literals. Such an ordering can be lifted to constrained clauses and sets thereof by its respective multiset extension. Hence, we overload any such order ≺ for literals, constrained clauses, and sets of constrained clause if the meaning is clear from the context. We define as the reflexive closure of ≺ and N Λ C := {D | D ∈ N and D Λ C}. An instance of an LPO [15] with appropriate precedence can serve as an H-order.

Definition 2 (Clause Redundancy). A ground clause Λ C is redundant with respect to a set N of ground clauses and an H-order ≺ if N Λ C Λ C.
A clause Λ C is redundant with respect to a clause set N and an H-order ≺ if for all Λ C ∈ gnd(Λ C) the clause Λ C is redundant with respect to gnd(N ).
If a clause Λ C is redundant with respect to a clause set N , then it can be removed from N without changing its semantics. Determining clause redundancy is an undecidable problem [11,63]. However, there are special cases of redundant clauses that can be easily checked, e.g., tautologies and subsumed clauses. Techniques for tautology deletion and subsumption deletion are the most common elimination techniques in modern first-order provers.
A tautology is a clause that evaluates to true independent of the predicate interpretation or assignment. It is therefore redundant with respect to all orders and clause sets; even the empty set.

Corollary 3 (Tautology for Constrained Clauses).
Since ¬(Λ C) is essentially ground (by existential closure and skolemization), it can be solved with an appropriate SMT solver, i.e., an SMT solver that supports unquantified uninterpreted functions coupled with linear real arithmetic. In [2], it is recommended to check only the following conditions for tautology deletion in hierarchic superposition:

Corollary 4 (Tautology Check).
A clause Λ C is a tautology if the existential closure of Λ is unsatisfiable or if C contains two literals L 1 and L 2 with The advantage is that the check on the first-order side of the clause is still purely syntactic and corresponds to the tautology check for pure first-order logic. Nonetheless, there are tautologies that are not captured by Corollary 4, e.g., x = y P (x) ∨ ¬P (y). The SCL(T) calculus on the other hand requires no tautology checks because it never learns tautologies as part of its conflict analysis [1,11,21]. This property is also inherent to the propositional CDCL (Conflict Driven Clause Learning) approach [8,34,41,55,63].

Subsumption for Constrained Clauses
A subsumed constrained clause is a clause that is redundant with respect to a single clause in our clause set. Formally, subsumption is defined as follows. [2]). A constrained clause Λ 1 C 1 subsumes another constrained clause Λ 2 C 2 if there exists a substitution σ such that C 1 σ ⊆ C 2 , vars(Λ 1 σ) ⊆ vars(Λ 2 ), and the universal closure of Λ 2 → (Λ 1 σ) holds in LRA.

Definition 5. (Subsumption for Constrained Clauses
Eliminating redundant clauses is crucial for the efficient operation of an automatic first-order theorem prover. Although subsumption is considered one of the easier redundancy relationships that we can check in practice, it is still a hard problem in general:

Lemma 6. (Complexity of Subsumption in the BS Fragment). Deciding subsumption for a pair of BS clauses is NP-complete.
Proof. Containment in NP follows from the fact that the size of subsumption matchers is limited by the subsumed clause and set inclusion of literals can be decided in polynomial time. For the hardness part, consider the following polynomial-time reduction from 3-SAT. Take a propositional clause set where all clauses have length three. Now introduce a 6-place predicate R and encode each propositional variable P by a first-order variable x P . Then a propositional clause L 1 ∨ L 2 ∨ L 3 can be encoded by an atom R(x P1 , p 1 , x P2 , p 2 , x P3 , p 3 ) where p i is 0 if L i is negative and 1 otherwise and P i is the predicate of L i . This way the clause set N can be represented by a single BS clause C N . Now construct a clause D that contains all atoms representing the way a clause of length three can become true by ground atoms over R and constants 0, 1. For example, it contains atoms like R (0, 0, . . .) and R(1, 1, . . .) representing that the first literal of a clause is true. Actually, for each such atom R(0, 0, . . .) the clause D contains |C N | copies. Finally, C N subsumes D if and only if N is satisfiable.
For BS(LRA) (and FOL(LRA)), there also exists research on how to perform the subsumption check in general [2,36], but the literature contains no dedicated indexing or filtering techniques for the constraint part of the subsumption check.
In this section and as the main contribution of this paper, we present the first such filtering techniques for BS(LRA). But first, we explain how to solve the subsumption check for constrained clauses in general.
First-Order Check. The first step of the subsumption check is exactly the same as in first-order logic without arithmetic. We have to find a substitution σ, also called a matcher, such that C 1 σ ⊆ C 2 . The only difference is that it is not enough to compute one matcher σ, but we have to compute all matchers for C 1 σ ⊆ C 2 until we find one that satisfies the implication Λ 2 → (Λ 1 σ). For instance, there are two matchers for the clauses C 1 := x + y ≥ 0 Q(x, y) and C 2 := x < 0, y ≥ 0 Q(x, x) ∨ Q(y, y). The matcher {x → y} satisfies the implication Λ 2 → (Λ 1 σ) and {y → x} does not. Our own algorithm for finding matchers is in the style of Stillman except that we continue after we find the first matcher [27,58]. Implication Check. The universal closure of the implication Λ 2 → (Λ 1 σ) can be solved by any SMT solver for the respective theory after we negate it. Note that the resulting formula is already in clause normal form and that the formula can be treated as ground since existential variables can be handled as constants. Intuitively, the universal closure Λ 2 → (Λ 1 σ) asserts that the set of solutions satisfying Λ 2 is a subset of Example 7. Let us now look at an example to illustrate the role that formula (1) plays in deciding subsumption. In our example, we have three clauses: Our goal is to test whether Λ 1 C 1 subsumes the other two clauses. As our first step, we try to find a substitution σ such that C 1 σ ⊆ C 2 . The most general substitution fulfilling this condition is σ := {z → x, u → 2}. Next, we check whether Λ 1 σ is implied by Λ 2 and Λ 3 . Normally, we would do so by solving the formula (1) with an SMT solver, but to help our intuitive understanding, we instead look at their solution sets depicted in Fig. 1. Note that Λ 1 σ simplifies to Λ 1 σ := y ≥ 0 , y ≤ 2 , y ≤ 2·x , y ≥ 2·x−4.
Here we see that the solution set for Λ 2 is a subset of Λ 1 σ. Hence, Λ 2 implies Λ 1 σ, which means that Λ 2 C 2 is subsumed by Λ 1 C 1 . The solution set for Λ 3 is not a subset of Λ 1 σ. For instance, the assignment β 2 := {x → 3, y → 1} is a counterexample and therefore a solution to the respective instance of formula (1). Hence, Λ 1 C 1 does not subsume Λ 3 C 2 .
Excess Variables. Note that in general it is not sufficient to find a substitution σ that matches the first-order parts to also match the theory constraints: C 1 σ ⊆ C 2 does not generally imply vars(Λ 1 σ) ⊆ vars(Λ 2 ). In particular, if Λ 1 contains variables that do not appear in the first-order part C 1 , then these must be projected to Λ 2 . We arrive at a variant of (1), that is ∃x 1 , . . . , x n ∀y 1 , . . . , y m . Λ 2 ∧ ¬(Λ 1 σ) where {x 1 , . . . , x n } = vars(Λ 2 ) and {y 1 , . . . , y m } = vars(Λ 1 ) \ vars(C 1 ). Our solution to this problem is to normalize all clauses Λ C by eliminating all excess variables Y := vars(Λ) \ vars(C) such that vars(Λ) ⊆ vars(C) is guaranteed. For linear real arithmetic this is possible with quantifier elimintation techniques, e.g., Fourier-Motzkin elimination (FME). Although these techniques typically cause the size of Λ to increase exponentially, they often behave well in practice. In fact, we get rid of almost all excess variables in our benchmark examples with simplification techniques based on Gaussian elimination with execution time linear in the number of LRA atoms. Given the precondition Y = ∅ achieved by such elimination techniques, we can compute σ as matcher for the first-order parts and then directly use it for testing whether the universal closure of Λ 2 → (Λ 1 σ) holds. An alternative solution to the issue of excess variables has been proposed: In [2], the substitution σ is decomposed as σ = δτ , where δ is the first-order matcher and τ is a theory matcher, i.e. dom(τ ) ⊆ Y and vars(codom(τ )) ⊆ vars(Λ 2 ). Then, exploiting Farkas' lemma, the computation of τ is reduced to testing the feasibility of a linear program (restricted to matchers that are affine transformations). The reduction to solving a linear program offers polynomial worst-case complexity but in practice typically behaves worse than solving the variant with quantifier alternations using an SMT solver such as Z3 [36,42].
Filtering First-Order Literals. Even though deciding implication of theory constraints is in practice more expensive than constructing a matcher and deciding inclusion of first-order literals, we still incorporate some lightweight filters for our evaluation. Inspired by Schulz [54] we choose three features, so that every feature f maps clauses to N 0 , and f ( The features are: |C + |, the number of positive first-order literals in C, |C − |, the number of negative first-order literals in C, and C , the number of occurrences of constants in C. Sample Point Heuristic. The majority of subsumption tests fail because we cannot find a fitting substitution for their first-order parts. In our experiments, between 66.5% and 99.9% of subsumption tests failed this way. This means our tool only has to check in less than 33.5% of the cases whether one theory constraint implies the other. Despite this, our tool spends more time on implication checks than on the first-order part of the subsumption tests without filtering on the constraint implication tests. The reason is that constraint implication tests are typically much more expensive than the first-order part of a subsumption test. For this reason, we developed the sample point heuristic that is much faster to execute than a full constraint implication test, but still filters out the majority of implications that do not hold (in our experiments between 93.8% and 100%).
The idea behind the sample point heuristic is straightforward. We store for each clause Λ C a sample solution β for its theory constraint Λ. Before we execute a full constraint implication test, we simply evaluate whether the sample solution β for Λ 2 is also a solution for Λ 1 σ. If this is not the case, then β is a solution for (1) and a counterexample for the implication. If β is a solution for Λ 1 σ, then the heuristic returns unknown and we have to execute a full constraint implication test, i.e., solve the SMT problem (1).
Often it is possible to get our sample solutions for free. Theorem provers based on hierarchic superposition typically check for every new clause Λ C whether Λ is satisfiable in order to eliminate tautologies. This means we can already use this tautology check to compute and store a sample solution for every new clause without extra cost. We only need to pick a solver for the check that returns a solution as a certificate of satisfiability. Although the SCL(T) calculus never learns any tautologies, it is also possible to get a sample solution for free as part of its conflict analysis [11]. Example 8. We revisit Example 7 to illustrate the sample point heuristic. During the tautology check for Λ 2 C 2 and Λ 3 C 2 , we determined that β 1 := {x → 2, y → 1} is a sample solution for Λ 2 and β 2 := {x → 3, y → 1} a sample solution for Λ 3 . Since Λ 2 implies Λ 1 σ, all sample solutions for Λ 2 automatically satisfy Λ 1 σ. This is the reason why the sample point heuristic never filters out an implication that actually holds, i.e., it returns unknown when we test whether Λ 2 implies Λ 1 σ. The assignment β 2 on the other hand does not satisfy Λ 1 σ. Hence, the sample point heuristic correctly claims that Λ 3 does not imply Λ 1 σ. Note that we could also have chosen β 1 as the sample point for Λ 3 . In this case, the sample point heuristic would also return unknown for the implication Λ 3 → Λ 1 σ although the implication does not hold.
Trivial Cases. Subsumption tests become much easier if the constraint Λ i of one of the participating clauses is empty. We use two heuristic filters to exploit this fact. We highlight them here because they already exclude some subsumption tests before we reach the sample point heuristic in our implementation.
The empty conclusion heuristic exploits that Λ 1 is valid if Λ 1 is empty. In this case, all implications Λ 2 → (Λ 1 σ) hold because Λ 1 σ evaluates to true under any assignment. So by checking whether Λ 1 = ∅, we can quickly determine whether Λ 2 → (Λ 1 σ) holds for some pairs of clauses. Note that in contrast to the sample point heuristic, this heuristic is used to find valid implications.
The empty premise test exploits that Λ 2 is valid if Λ 2 is empty. In this case, an implication Λ 2 → (Λ 1 σ) may only hold if Λ 1 σ simplifies to the empty set as well. This is the case because any inequality in the canonical form n i=1 a i x i c either simplifies to true (because a i = 0 for all i = 1, . . . , n and 0 c holds) and can be removed from Λ 1 σ, or the inequality eliminates at least one assignment as a solution for Λ 1 σ [51]. So if Λ 2 = ∅, we check whether Λ 1 σ simplifies to the empty set instead of solving the SMT problem (1).

Pipeline.
We call our approach a pipeline since it combines multiple procedures, which we call stages, that vary in complexity and are independent in principle, for the overall aim of efficiently testing subsumption. Pairs of clauses that "make it through" all stages, are those for which the subsumption relation holds. The pipeline is designed with two goals in mind: (1) To reject as many pairs of clauses as early as possible, and (2) to move stages further towards the end of the pipeline the more expensive they are.
The pipeline consists of six stages, all of which are mentioned above. We divide the pipeline into two phases, the first-order phase (FO-phase) consisting of two stages, and the constraint phase (C-phase), consisting of four stages. First-order filtering rejects all pairs of clauses for which f (C 1 ) > f(C 2 ) holds. Then, matching constructs all matchers σ such that C 1 σ ⊆ C 2 . Every matcher is individually tested in the constraint phase. Technically, this means that the input of all following stages is not just a pair of clauses, but a triple of two clauses and a matcher. The constraint phase then proceeds with the empty conclusion heuristic and the empty premise test to accept (resp. reject) all trivial cases of the constraint implication test. The next stage is the sample point heuristic. If the sample solution β 2 for Λ 2 is no solution for Λ 1 (i.e. Λ 1 σβ 2 ), then the matcher σ is rejected. Otherwise (i.e. Λ 1 σβ 2 ), the implication test Λ 2 → (Λ 1 σ) is performed by solving the SMT problem (1) to produce the overall result of the pipeline and finally determine whether subsumption holds.

Experimentation
In order to evaluate our new approach on three benchmark instances, derived from BS(LRA) applications, all presented techniques and their combination in form of a pipeline were implemented in the theorem prover SPASS-SPL, a prototype for BS(LRA) reasoning.
Note that SPASS-SPL contains more than one approach for BS(LRA) reasoning, e.g., the Datalog hammer for HBS(LRA) reasoning [10]. These various modes of operation operate independently, and the desired mode is chosen via command-line option. The reasoning approach discussed here is the current default option. On the first-order side, SPASS-SPL consists of a simple saturation prover based on hierarchic unit resolution, see Algorithm 1. It resolves unit clauses with other clauses until either the empty clause is derived or no new clauses can be derived. Note that this procedure is only complete for Horn clauses. For arithmetic reasoning, SPASS-SPL relies on SPASS-SATT, our sound and complete CDCL(LA) solver for quantifier-free linear real and linear mixed/integer arithmetic [12]. SPASS-SATT implements a version of the dual simplex algorithm fine-tuned towards SMT solving [16]. In order to ensure soundness, SPASS-SATT represents all numbers with the help of the arbitraryprecision arithmetic library FLINT [31]. This means all calculations, including the implication test and the sample point heuristic, are always exact and thus free of numerical errors. The most relevant part of SPASS-SPL with regards to this paper is that it performs tautology and subsumption deletion to eliminate redundant clauses. As a preprocessing step, SPASS-SPL eliminates all tautologies from the set of input clauses. Similarly, the function resolvents(C, N ) (see Line 4 of Algorithm 1) filters out all newly derived clauses that are tautologies. Note that we also use these tautology checks to eliminate all excess variables and to store sample solutions for all remaining clauses. After each iteration of the algorithm, we also check for subsumed clauses. We first eliminate newly generated clauses by forward subsumption (see Line 6 of Algorithm 1), then use the remaining clauses for backward subsumption (see Line 8 of Algorithm 1).
Benchmarks. Our benchmarking instances come out of three different applications. (1.) A supervisor for an automobile lane change assistant, formulated in the Horn fragment of BS(LRA) [9,10] (five instances, referred to as lc in aggregate). (2.) The formalization of reachability for non-deterministic timed automata, formulated in the non-Horn fragment of BS(LRA) [20] (one instance, referred to as tad). (3.) Formalizations of variants of mutual exclusion protocols, such as the bakery protocol [38], also formulated in the non-Horn fragment of BS(LRA) [19] (one instance, referred to as bakery). The machine used for benchmarking features an Intel Xeon W-1290P CPU (10 cores, 20 threads, up to 5.2 GHz) and 64 GiB DDR4-2933 ECC main memory. Runtime was limited to ten minutes, and memory usage was not limited. Evaluation. In Table 1 we give an overview of how many pairs of clauses advance how far in the pipeline (in thousands). Rows with grey background refer to a stage of the pipeline and show which portion of pairs of clauses were kept, relative to the previous stage. Rows with white background refer to (virtual) sets of clauses, their absolute size, and their size relative to the number of attempted tests, as well as the condition(s) established. The three groups of columns refer to groups of benchmark instances. Results vary greatly between lc and the aggregate of bakery and tad. In lc the relative number of subsumed clauses is significantly smaller (0.0027% compared to 0.0416%). FO Matching eliminates a large number of pairs in lc, because the number of predicate symbols, and their arity (lc1, . . . , lc4: 36 predicates, arities up to 5; lc5: 53 predicates, arities up to 12) is greater than in bakery (11 predicates, all of arity 2) and tad (4 predicates, all of arity 2).

Binary Classifiers.
To evaluate the performance of each stage of the proposed test pipeline, we view each stage individually as a binary classifier on pairs of constrained clauses. The two classes we consider are "subsumes" (positive outcome) and "does not subsume" (negative outcome). Each stage of the pipeline computes a prediction on the actual result of the overall pipeline. We are thus interested in minimizing two kinds of errors: (1) When one stage of the pipeline predicts that the subsumption test will succeed (the prediciton is positive) but it fails (the actual result is negative), called false positive (FP).
(2) When one stage of the pipeline predicts that the subsumption test will fail (the prediction is negative) but it succeeds (the actual result is positive), called false negative (FN). Dually, a correct prediction is called true positive (TP) and true negative (TN). For each stage, at least one kind of error is excluded by design: Firstorder filtering and the sample point heuristic never produce false negatives. The empty conclusion heuristic never produces false positives. The empty premise test is perfect, i.e. it neither produces false positives nor false negatives, with the caveat of not always being applicable. The last stage (implication test) decides the overall result of the pipeline, and thus is also perfect. For evaluation of binary classifiers, we use four different measures (two symmetric pairs): The first pair, specificity (SPC) and positive predictive value, see (2), is relevant only in presence of false postives (the measures approach 1 as FP approaches 0).
The second pair, sensitivity (SEN) and negative predictive value (NPV), see (3), is relevant only in presence of false negatives (the measures approach 1 as FN approaches 0). Specificity (resp. sensitivity) might be considered the "success rate" in our setup. They answer the question: "Given the actual result of the pipeline is 'subsumed' (resp. 'not subsumed'), in how many cases does this stage predict correctly?" A specificity (resp. sensitivity) of 0.99 means that the classifier produces a false positive (resp. negative), i.e. a wrong prediction, in one out of one hundred cases. Both measures are independent of the prevalence of particular actual results, i.e. the measures are not biased by instances that feature many (or few) subsumed clauses. On the other hand, positive and negative predictive value are biased by prevalence. They answer the following question: "Given this stage of the pipeline predicts 'subsumed' (resp. 'not subsumed'), how likely is it that the actual result indeed is 'subsumed' (resp. 'not subsumed')?" In Table 2 we present for all non-perfect stages of the pipeline specificity (for those that produce false positives) and sensitivity (for those that produce false negatives) as well as the (positive/negative) predictive value. Note that the sample point heuristic has an exceptionally high specificity, still above 93% in the benchmarks where it performed worst. For the benchmarks bakery and tad it even performs perfectly. Combined, this gives a specificity of above 99.99%. Considering FO Filtering, we expect limited performance, since the structure of terms in BS is flat compared to the rich structure of terms as trees in full first-order logic. This is evidenced by a comparatively low specificity of 35%. However, this classifier is very easy to compute, so pays for itself. FO Matching is a much better classifier, at an aggregate sensitivity of 93%. Even though this classifier is NP-complete, this is not problematic in practice. Table 3 we focus on the runtime improvement achieved by the sample point heuristic. In the first two lines (Bottleneck), we highlight how much slower testing implication of constraints (the C-phase) is compared to treating the firstorder part (the FO-phase). This is equivalent to the time taken for the C-phase per pair of clauses (that reach at least the first C-phase) divided by the time taken for the FO-phase per pair of clauses. We see that without the sample point heuristic, we can expect the constraint implication test to take hundreds to thousands of times longer than the FO-phase. Adding the sample point heuristic decreases this ratio to below one hundred. In the fourth line (avg. pipeline runtime) we do not give a ratio, but the average time it takes to compute the whole pipeline. We achieve millions of subsumption checks per second. In the fifth line (Speedup), we take the time that all C-phases combined take per pair of clauses that reach at least the first C-phase, and take the ratio to the same time without applying the sample point heuristic. In the sixth line (Benefit-to-cost), we consider the time taken to compute the sample point vs. the time it saves. The benefit is about two orders of magnitude greater than the cost.

Conclusion
Our next step will be the integration of the subsumption test in the backward subsumption procedure of an SCL based reasoning procedure for BS(LRA) [11] which is currently under development.
There are various ways to improve the sample point heuristic. One improvement would be to store and check multiple sample points per clause. For instance, whenever the sample point heuristic fails and the implication test for Λ 2 → (Λ 1 σ) also fails, store the solution to (1) as an additional sample point for Λ 2 . The new sample point will filter out any future implication tests with Λ 1 σ or similar constraints. However, testing too many sample points might lead to costs outweighing benefits. A potential solution to this problem would be score-based garbage collection, as done in SAT solvers [57]. Another way to store and check multiple sample points per clause is to store a compact description of a set of points that is easy to check against. For instance, we can store the center point and edge length of the largest orthogonal hypercube contained in the solutions of a constraint, which is equivalent to infinitely many sample points. Computing the largest orthogonal hypercube for an LRA constraint is not much harder than finding a sample solution [14]. Checking whether a cube is contained in an LRA constraint works almost the same as evaluating a sample point [14].
Although we developed our sample point technique for the BS(LRA) fragment it is obvious that it will also work for the overall FOL(LRA) clause fragment, because this extension does not affect the LRA constraint part of clauses. From an automated reasoning perspective, satisfiability of the FOL(LRA) and BS(LRA) fragments (clause sets) is undecidable in both cases. Actually, satisfiability of a BS(LRA) clause set is already undecidable if the first-order part is restricted to a single monadic predicate [32]. The first-order part of BS(LRA) is decidable and therefore enables effective guidance for an overall reasoning procedure [11]. Form an application perspective, the BS(LRA) fragment already encompasses a number of used (sub)languages. For example, timed automata [3] and a number of extensions thereof are contained in the BS(LRA) fragment [60].
We also believe that the sample point heuristic will speed up the constraint implication test for FOL(LIA), first-order clauses over linear integer arithmetic, FOL(NRA), i.e., first-order clauses over non-linear real arithmetic, and other combinations of FOL with arithmetic theories. However, the non-linear case will require a more sophisticated setup due to the nature of test points in this case, e.g., a solution may contain root expressions.
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.