Keywords

1 Introduction

Satisfiability Modulo Theories (SMT) is a critical area of research focusing on the satisfiability of first-order logic formulas. The growth of SMT springs from the success of propositional Satisfiability (SAT) solving in the early 1960s. SMT aims to generalize the achievements of SAT solvers from propositional logic to fragments of first-order logic. The research in SMT has broadened the scope to include more complex logic, such as the theory of equalities and uninterpreted functions, array theory, bit-vector, floating-point arithmetic, difference logic, and linear and non-linear arithmetic.

Our research focuses on Arithmetic Theories, which comprise arithmetic atomic formulas of polynomial equations or inequalities over real or integer variables. Arithmetic Theories can be categorized into four distinctive sets based on the form of formulas and the domain of variable definitions, including Linear Real Arithmetic (LRA), Linear Integer Arithmetic (LIA), Non-linear Real Arithmetic (NRA), and Non-linear Integer Arithmetic (NIA). In widespread terms, considering all four subsets, the SMT problems of Arithmetic Theories are collectively denoted as SMT(Arithmetic). SMT(Arithmetic) is fundamental in numerous applications such as program verification [6], termination analysis [9], symbolic execution [11], test-case generation [26], program synthesis, optimization [23], and scheduling [31]. Formal verification of embedded software and hybrid systems [5] often requires deciding the satisfiability of quantifier-free first-order formulas involving arithmetic. For many users of SMT solvers, the solver’s performance is a bottleneck for their application, so improving solver performance remains a top priority for solver developers.

Today, most state-of-the-art SMT solvers remain single-threaded, and previous efforts have been mainly devoted to improving the techniques and heuristics in sequential SMT solvers. In addition to enhancing the efficiency of the sequential solver, it is a natural idea to boost the solver performance by employing distributed SMT solving, considering the increasing availability of computational resources. Current research in distributed SMT solving can be divided into two main directions: portfolio and divide-and-conquer. Extensive investigations on partitioning strategies have been predominantly spearheaded by the OpenSMT2 team and are well-documented in [32]. A recent study in [32] has explored hybrid strategies that combine both methods and show improvements.

Portfolio solvers deploy multiple solvers or varying configurations of a single solver that attempt to solve identical or perturbed but equivalent SMT problems concurrently [33]. Portfolio solvers are limited by the best possible sequential performance. Consequently, the alternative divide-and-conquer method is a compelling approach. In this approach, the original problem is partitioned into sub-problems so that solving the sub-problems provides a solution to the original problem. The underlying assumption is that the smaller search spaces of the sub-problems allow for faster parallel solving than addressing the original problem in its entirety.

Our research focuses on the partitioning strategy for divide-and-conquer. While divide-and-conquer potentially outperforms the best sequential performance, it hinges heavily on discovering an effective partitioning algorithm, which still needs to be explored. Conventionally, a lookahead heuristic [21] is employed, opting for variables that reduce most exploration space, which is also studied in parallel MIP solving [29, 30]. Most partitioning algorithms adopt a pre-partitioning approach, partitioning the problem into sub-problems before parallel solving [32]. This pre-partitioning approach inevitably leads to a waste of computational resources, although the multijob strategy [15] can alleviate this issue to some extent. Also, such partition-at-one-time strategies need to pay high costs to create sub-problems that are as balanced as possible. Beyond the pre-partitioning strategy, OpenSMT2 implements a dynamic partitioning method [1, 24]. The parallel solver partitions the instance dynamically on-demand and allows clauses to be shared between solvers working on different instances.

Existing partitioning strategies for SMT mainly follow those for SAT, partitioning at the Boolean level (dividing the problem with different assignments to Boolean encoders), also known as the SMT term-level [20, 32]. For formulas with complex logical structures, sufficient sub-problems can be generated by partitioning only at the term-level. However, these term-level partitioning strategies become futile for formulas with a simple Boolean structure, such as almost pure conjunction formulas, usually appearing in program verification and theorem-proving involving complex theories. Note that for the bit-vector theory, both bit-level and term-level partitioning have been studied in PBoolector [27].

In view of these shortcomings, we propose a dynamic parallel framework based on arithmetic variable-level partitioning. This framework ensures full utilization of computing resources, preventing idle core resources from lacking executable tasks. The dynamic parallel framework provides flexibility for parallel trees to grow. Thus, it can easily collaborate with other partitioning strategies — any sub-problem yielded previously by pre-partitioning strategies can be partitioned further. (Section 3)

What is more important, this is the first attempt to perform variable-level partitioning for arithmetic theories. Each time it picks a variable and partitions the problem by dividing the feasible domain of the variable, leading to (typically two) sub-problems, which can be further simplified via constraint propagation. Our proposed variable-level partitioning permits robust, comprehensive partitioning. Regardless of the Boolean structure of any given instance, our partitioning algorithm can keep partitioning to the last moment of the solving process. The variable-level partitioning strategy can be easily applied to other theories. (Section 4.2)

The effectiveness of our partition strategy is closely related to the underlying constraint propagation techniques to simplify the sub-problems. We propose an improved version of Interval Constraint Propagation (ICP) [22, 28], named Boolean and Interval Constraint Propagation (BICP), and integrate it within our variable-level partitioning strategy. The BICP conducts arithmetic feasible interval reasoning and successfully integrates Boolean propagation, allowing stronger propagation. (Section 4.3)

We apply our techniques to three state-of-the-art SMT solvers as our base solvers, including CVC5 [2], OpenSMT2 [18] and Z3 [25]. Experiments are conducted to evaluate the resulting parallel solvers on four benchmarks, including QF_LRA, QF_LIA, QF_NRA, and QF_NIA instances from SMT-LIBFootnote 1. Furthermore, we compare our variable-level partitioning strategy to the default partitioning strategies in CVC5 [32] and OpenSMT2 [19]. Overall, the experimental results show our techniques significantly improve the performance of the sequential SMT solvers, leading to a remarkable increase in total solved instances. Besides, our variable-level partitioning strategy exhibits superior performance and diversity compared to best term-level strategies, particularly in pure-conjunction instances.

2 Preliminaries

2.1 Definitions and Notations

A monomial m is an expression of the form \(\prod _{i} x_i^{e_{i}}\) where \(e_i \in \mathbb {N}\), \(x_i\) is a real (or integer) variable. A polynomial p is a linear combination of monomials, an arithmetic expression with \(\sum _i a_i m_i\) where \(a_i\) are rational numbers and \(m_i\) are monomials. If all its monomials are linear in a polynomial, it is linear; otherwise, it is non-linear. A quantifier-free arithmetic formula is a first-order formula whose atoms are either propositional variables of equalities, disequalities, or inequalities of the form: \(p \sim b\), where \(\sim \in \{<, \le , >, \ge , =\}\), \(b \in \mathbb {R}\).

A Conjunctive Normal Form (CNF) formula is a conjunction of clauses \(\bigwedge _i c_i\), each clause \(c_i\) being a disjunction of literals \(\bigvee _j l_j\), and each literal \(l_j\) being either an quantifier-free arithmetic formula v or its negation \(\bar{v}\). A clause containing only one literal is called a unit clause. The length of a clause is the number of literals in it. The length of a SMT formula is the sum of the length of the clauses in the formula. A key procedure in SAT and SMT solvers is the unit clause rule: if a clause is unit, then its sole unassigned literal must be assigned value true for the clause to be satisfied. The iterated application of the unit clause rule is referred to as unit propagation or Boolean constraint propagation (BCP).

2.2 Parallel SMT Solving with Partitioning

A typical parallel method is divide-and-conquer, which is based on partitioning. The satisfiability of a formula \(\phi \) can be determined in parallel by dividing it into n independent sub-problems \(\phi _1, \dots , \phi _n\). Provided the disjunction \(\phi _1, \dots , \phi _n\) is equi-satisfiable with \(\phi \) if any of the sub-problems are satisfiable, then the original problem is satisfiable, and if all of the sub-problems are unsatisfiable, then the original problem is unsatisfiable. No synchronization is necessary during solving in this simple scenario because the sub-problems are independent.

There are two main partitioning strategies, including cube-and-conquer and scattering. In the cube-and-conquer [15] partitioning strategy, a set of N atoms is selected, and each of the \(2^N\) possible cubes using these atoms is used as a partitioning formula, resulting in \(2^N\) partitions. Scattering [17] is an alternative strategy that differs from cube-and-conquer in that it creates partitioning formulas that are not cubes. Instead, scattering produces a series of N partitioning formulas as follows. The first partitioning formula is some cube \(C_1\). The second is \(\lnot C_1 \wedge C_2\) for some new cube \(C_2\). The next is \(\lnot C_1 \wedge \lnot C_2 \wedge C_3\) for a new cube \(C_2\), and so on. The \(N^{th}\) partitioning formula is simply \(\lnot C_1 \wedge \dots \wedge \lnot C_{N - 1}\). The partitioning formulas by construction are disjoint, and the partitioning algorithm has considerable freedom in selecting cube variables.

2.3 Interval Constraint Propagation

Interval Constraint Propagation (ICP) is an efficient numerical method for finding interval over-approximations of solution sets of SMT formulas, and it is particularly beneficial for non-linear systems [3, 12, 28]. The fundamental principle of ICP is to maintain a feasible interval for every variable and shrink these intervals using relatively simple constraint propagation. This technique can effectively exclude extensive portions of the search space, sometimes proving unsatisfiability. ICP has been successfully implemented in various solvers such as dReal [13], HySAT [10], and SMT-RAT [7]. The primary method of using ICP is to quickly shrink the space of solution candidates and then exploit these additional bounds in the algebraic methods. We use a straightforward example to show how ICP works.

Example 1

(Interval constraint propagation).

Case 1. Consider the constraint set \(S = \{x > 1, \ x < 4, \ xy > 4, \ yz^2 \le 4\}\). ICP contracts the feasible interval by the constraint set in the following way: Step 1. We derive \(x \in (1, 4)\) from \(x > 1 \wedge x < 4\). Step 2. By applying interval arithmetic, \(y \in (1, \infty )\) is procured from \(x \in (1, 4) \wedge xy > 4\). Step 3. We can further narrow the interval on z by maintaining its consistency with \(y \in (1, \infty ) \wedge yz^2 \le 4\), and obtain \(z \in (-2, 2)\). The resultant feasible intervals of variables after ICP are \(x \in (1, 4)\), \(y \in (1, \infty )\), and \(z \in (-2, 2)\).

Case 2. Consider the constraint set \(S' = S \cup \{2xz + y^2 < -20\}\). As in case 1, ICP yields \(x \in (1, 4)\), \(y \in (1, \infty )\), and \(z \in (-2, 2)\). We can obtain \(2xz + y^2 = 2 \times (1, 4) \times (-2, 2) + (1, \infty )^2 \in (-15, \infty )\) by interval arithmetic. The intersection of \((-15, \infty )\) and \((-\infty , -20)\) results in an empty set. Thereby, ICP detects the unsatisfiability of the constraint set \(S'\).

3 Dynamic Parallel Framework Based on Arithmetic Partitioning

This section introduces our dynamic parallel framework that leverages arithmetic variable-level partitioning. We first present the framework, including the main components and how they cooperate together. Two related techniques in the framework will also be introduced, followed by an illustration example. The partitioning algorithm will be introduced in Sect. 4.2.

3.1 The Framework

As illustrated in Fig. 1, in the parallel framework, there are three classes of threads, namely, the partitioner thread, the master thread, and worker threads. The master thread schedules tasks with a pivotal data structure: task buffer.

Fig. 1.
figure 1

Our dynamic parallel framework.

Partitioner. The partitioner generates sub-problems (also known as tasks) and puts them into the task buffer. It receives a formula from the task buffer and picks a variable (using heuristics and information from the master) to partition the formula. This would result in two sub-problems. The sub-problems are then simplified using constraint propagation techniques. Finally, the simplified sub-problems are put into the task buffer.

Master. The master plays a crucial role in task scheduling, including task assignments, on-demand terminations, and UNSAT propagation. It receives tasks generated by the partitioner, storing them in a task buffer for future assignments. The task buffer usually stores more tasks than the computation cores available. This strategy ensures the immediate task assignment by the master as soon as a computational resource is available, from the task buffer to worker threads. Simultaneously, worker threads keep the master informed of the status of running tasks. There are three possible statuses of ongoing tasks F:

  • UNSAT: it does not necessarily mean the original problem is unsatisfied. Nonetheless, the master can perform UNSAT propagation (Sect. 3.2) if possible to speed up solving.

  • SAT: the algorithm can confirm the original problem is SAT. A solution to the original SMT formula can be easily constructed by combining the solution of F and previous assignments to the variables being removed due to simplifications.

  • Running: the master will analyze the information on tasks and send a termination signal to the workers that are solving problems that are unlikely to be resolved in the near future. The specific strategies and details of termination will be elaborated later in this section.

During the solving, if the root task is proven UNSAT, it marks the end of the algorithm, regardless of whether the UNSAT is the product of solving the problem by itself or UNSAT propagating upwards from sub-tasks.

Workers. The worker threads mainly call a base SMT solver to solve the problem assigned to it by the master. It communicates the result to the master when it succeeds in solving the problem. Additionally, a worker thread may also receive a termination signal from the master. In this case, it terminates the running task and releases its computational resources to make room for other tasks. Notably, worker threads have high flexibility in selecting and configuring their base solvers, thereby fostering diversified solving.

We would like to remark that the flexible task scheduling and partitioning strategy significantly enhance the capabilities of our framework. We can explore various scheduling heuristics when deciding which tasks require further partitioning. By designing the scheduling heuristic, we can extend the parallel tree in the way of BFS-like cube or DFS-like scattering strategy. We can also extend the parallel tree with dynamic scheduling.

3.2 Partition Tree Maintenance and UNSAT Propagation

To perform UNSAT propagation, the master maintains a partition tree, which records the history of formula partitioning. The partition tree consists of task nodes containing information such as the simplified SMT formula, the parent and children tasks, and the task’s execution status (Waiting, Running, Terminated, SAT, UNSAT). Additionally, it keeps track of event timelines, including creation, execution, and termination times. As the solving progresses, the tree dynamically updates the state of its nodes. When the partitioner creates new subtasks through partitioning, they are added to the partition tree and subsequently updated with the related information.

When the master is notified of an UNSAT result of a task, it performs UNSAT propagation, if possible, to speed up solving. The UNSAT propagation can occur in two distinct directions: upward and downward.

  • The upward propagation: if a task has a parent task and all the subtasks of the parent task are UNSAT, the master can infer the parent task as UNSAT and continue propagating this UNSAT upwards.

  • The downward propagation: the master should terminate all of its subtasks since the parent task being proven UNSAT implies that all subtasks are UNSAT.

3.3 Terminate on Demand

In our framework, there are situations where a subtask and its parent task run concurrently. Allowing tasks with overlap to be solved simultaneously could inevitably result in exploring some identical search space, which we prefer to avoid. Furthermore, terminating these despairing tasks can help preserve available resources for more promising tasks. Based on these considerations, we propose a heuristic to determine if a running task should be terminated.

  • If both subtasks of the task are in waiting states, the task is allowed a sufficient runtime duration.

  • However, the task’s runtime should be limited if both subtasks of the task have entered or been in a running state. Since the search space of the task entirely covers its descendant tasks, we should avoid unnecessary duplication of search wherever possible.

  • The runtime limitation for the other scenarios should lie between these two situations.

3.4 A Running Example

Fig. 2.
figure 2

A possible scheduling of our framework.

Consider a running state in our framework as depicted in Fig. 2(a). In this scenario, the partitioner has generated 13 tasks with running statuses. With 5 computational cores, the size of the task buffer is set to 4.

Step 1. Task 7 finishes with an UNSAT result and notifies the master. At this point, the computational resource previously allocated to task 7 is released, and the master performs both upward and downward propagation of the UNSAT result.

  • During the upward propagation, the master determines that the status of task 6 is UNSAT by emerging the UNSAT result of its sub-tasks.

  • With the downward propagation, the master updates the status of tasks 11 and 12 to UNSAT.

Step 2. The master sends a termination signal to task 4 according to the heuristic in Sect. 3.3, and task 4 is terminated, and the computational resource is released.

Step 3. Currently, the number of waiting tasks is fewer than the number of cores, prompting the partitioner to partition task 9 with variable z, subsequently adding new subtasks 13 and 14 to the task buffer.

Step 4. Recent result notification, UNSAT propagation, and task termination free up three computational cores. Subsequently, tasks 3, 5, and 13 are assigned to the worker threads based on the master’s scheduling strategy. So, we arrive at the final running state shown in Fig. 2(b).

We omit these further task partitioning for clarity and brevity in our illustration. One may notice that the “SAT” label is missing from the tree. In fact, whenever any worker thread returns a SAT result, we know the answer to the original formula is SAT, and the model can be constructed easily.

4 Variable-Level Partitioning for Arithmetic Theories

Partitioning strategies are crucial in the paradigm of divide-and-conquer parallel solving and have a significant impact on the overall efficiency of parallel solving [16, 19,20,21, 32]. This section explores a variable-level partitioning strategy based on the BICP method, making a deeper and more comprehensive exploration of arithmetic theories.

4.1 Preprocessing

For convenience of constraint propagation, we preprocess the original formula to a standard form. Note that the preprocessing is performed only once for the original formula. All sub-formulas obtained from partitioning inherit this form. We utilize preprocessing techniques including but not limited to if-then-else operators elimination, constants elimination, equality propagation, flattening of nested operations, and normalizing polynomial formats.

As done in [8, 12, 28], we preprocess our formulas to maintain constraints in an easily-managed form of \(x \sim b\). The preprocessing introduces two sets of auxiliary variables: polynomial and monomial. Whenever a monomial \(m_i\) first occurs, an associated monomial variable \(v_i^m\) is introduced. We substitute \(m_i\) with \(v_i^m\), augmenting the original formula with the clause \(v_i^m = m_i\). Likewise, for each non-monomial atomic formula \(p_i \sim b_i\), a new variable \(v_i^p\) is defined. \(p_i\) is replaced by \(v_i^p\) in the formula, and the associated clause \(v_i^p = p_i\) is added. Moreover, we performed an enhanced normalization as follows. For convenient notation, we assume that each constraint embeds constants and coefficients, which can be expressed as fractions. Consider a non-linear constraint c of the following form:

$$\begin{aligned} \sum _{i = 1}^{|p|} \frac{a_i^n}{a_i^d} v^m_i \sim \frac{b^n}{b^d}, \ a_i^n, \ a_i^d, \ b^n \ \textrm{and} \ b^d \in \mathbb {Z}. \end{aligned}$$

To simplify and normalize the polynomial form, allowing shared variable boundaries that enhance efficiency in interval contraction, we transform the constraints into the standard form:

$$\begin{aligned} \sum _{i = 1}^{|p|} a_i' v^m_i \sim b', \ a_i' \in \mathbb {Z}, \ b' \in \mathbb {Q} \ \textrm{and} \ \textrm{gcd}\left( a_1', \dots , a_{|p|}' \right) = 1. \end{aligned}$$

This leads to \(v^p = \sum _{i = 1}^{|p|} a'_i v^m_i\) with the adjusted bound \(v^p \sim b\). To summarize, for each distinctive non-linear monomial and linear left-hand side (excluding constants), a single \(v^m\) and \(v^p\) are introduced, respectively. This way, the procedures facilitate the construction of necessary equations for the ICP algorithm and generate bounds of introduced variables to express potential inequalities.

4.2 The Partitioning Algorithm

figure a

When the master notifies the partitioner to perform partitioning, the partitioner runs Algorithm 1 to generate two subtasks and sends them to the task buffer. The partition tree is then updated accordingly.

Choose a Formula to Partition. The partitioner thread maintains a partition tree as the master does. It iteratively chooses a leaf node from the tree and performs partitioning to the corresponding formula in the node, following the below heuristic:

  • prefers a leaf node of the lowest level (that is, a leaf node closest to the root node)

  • if there are more than one such leaf nodes, ties are broken by preferring the one with the most clauses, and further ties are broken by the length of the formula.

We would like to note that it is not mandatory for the partitioner to maintain the partition tree, as we already have the tree in the master. The reason we choose to also maintain the partition tree in the partitioner thread is mainly to reduce the communication cost and thus improve efficiency.

Choose an Arithmetic Variable for Partitioning. Given a formula, a specific variable is selected for partitioning. In the following, we discuss selecting partitioning variables and conducting our variable-level partitioning in arithmetic theories. The selection of arithmetic variables avoids auxiliary variables introduced in the preprocessing for ICP. When selecting the partitioning variable, we consider the following variable features in the formula. Our experimental validation has led us to prioritize these features based on their importance. We design a multi-tier heuristic, in which if the primary indicator is identical, the secondary indicator is considered, and so forth.

  1. 1.

    The highest degree of the variable in the constraints.

  2. 2.

    The occurrence frequency of the variable in the simplified formula.

  3. 3.

    The partitioning times of the variable.

With our implementation and evaluation, we rarely encounter variables that fall back to the third indicator, which means that the first two features are crucial indicators within our selection procedure.

Create Two Sub-Formulas via Interval Partitioning. Once the partitioning variable has been decided, creating partitions with reference to this variable is another pivotal aspect of the variable-level partitioning. In this context, ub and lb represent the existing upper bound and lower bound of the selected variable. The partitioner performs partitioning depending on the feasible interval of the variable as follows:

  1. 1.

    Strictly containing 0: A partitioning is made at the value 0.

  2. 2.

    Both an upper and a lower bound: Proportionate partitioning is performed in demand, usually at the midpoint \((lb + ub) / 2\).

  3. 3.

    Only an upper bound or a lower bound: Partitioning at \((ub - penalty)\), penalty is a parameter. The case for the lower-bound-only variable is similar.

  4. 4.

    No upper or lower bound: Containing 0, thereby satisfying condition 1.

4.3 BICP in Arithmetic Partitioning

This section proposes an enhanced constraint propagation method called Boolean and Interval Constraint Propagation (BICP). Typically, the ICP is implemented within the theory solver in SMT, which is used for performing constraint propagation on conjunctive arithmetic constraints, sometimes leading to an UNSAT result.

We extend ICP to handle constraints with disjunctive structure by combining it with Boolean constraint propagation. This algorithm collects variable information from Boolean constraint propagation, including the feasible intervals of arithmetic variables, monomials, and polynomials, as well as the assignments of Boolean variables. BICP uses the feasible intervals of arithmetic variables to calculate the feasible intervals of monomials and polynomials encompassing these variables, which is straightforward. Additionally, the algorithm exploits the feasible intervals of monomials and polynomials to update the interval of variables within them. BICP involves both Boolean constraint propagation and numerical computation algorithms, a typical example being Newton’s method for interval arithmetic [14]. During the propagation, BICP efficiently contracts the feasible interval of integer variables. Sometimes, BICP can lead to an UNSAT result, and in this case the task is directly discarded. An intuitive explanation is that BCP could lead to potential unit propagation and collect more arithmetic constraints, thus enhancing propagation ability. Without BCP, the ICP process may terminate early. Some examples of the techniques mentioned in BICP are listed in Table 1.

Table 1. Techniques implemented in BICP.

Exhaustive propagation takes a lot of work due to the “slow convergence” of ICP [4]. In practice, the algorithm follows a predefined number of iterations or ignores the negligible improvement within a given threshold.

The SMT formula (i.e., a task) is then simplified according to the result of BICP, which may change the domains of the variables. The simplification procedure comprises three sub-procedures: clause reduction, feasible domain contraction of variables, and literal propagation.

  • Reduce Clauses. We examine each clause of the original formula. This involves examining the truth value of the literals in the clause. For arithmetic literals, recalling that after preprocessing (Sect. 4.1), the polynomial in any arithmetic literal \(\ell \) has been replaced with an auxiliary arithmetic variable x, we calculate the truth value of \(\ell \) as follows: if the feasible interval of x is a subset of the feasible interval represented by the literal \(\ell \), then \(\ell \) is True; if there is no intersection between these two domains, then literal \(\ell \) is False; in other cases, we do not know the value of the literal. If a clause contains any literal whose value is True (due to propagation), the clause is satisfied, and thus it is removed. Conversely, we formulate a new clause from all literals with unknown status and add it to the task. If some clause becomes empty, it means the formula is UNSAT, and thus, the formula can be discarded.

  • Express Feasible Domains as Constraints. We gather all assignments of Boolean variables and feasible intervals of arithmetic variables obtained from the constraint propagation. Each of them is expressed by a constraint and added to the formula.

  • Address Literals Assigned by BCP. We avoid adding constraints about the feasible intervals of auxiliary variables, which stand for monomials and polynomials, into the task. To ensure that the simplified task has the same solution as the given formula, we need to collect the literals assigned by Boolean constraint propagation. This part could lead to redundancies within the task. So, we eliminate constraints dominated by others through a simple detection, ensuring the accuracy and conciseness of the task.

Example 2

(BICP and formula simplification). Consider the SMT formula \((x > 1) \wedge (x < 4) \wedge (xy > 4) \wedge (yz^2 \le 4) \wedge (\lnot a \vee x < -2) \wedge (y > 0 \vee x^2 z + y = 3) \wedge (a \vee x^2 \ge 4 \vee y > 5)\). From Example 1 in Sect. 2.3, we can derive \(x \in (1, 4), y \in (1, \infty ),\) and \(z \in (-1, 1)\) by Boolean constraint propagation and interval arithmetic. Then, we infer \((x < -2) \mapsto \textrm{False}\) and propagate \(\lnot a\). After the BICP, the status of the given formula is:

$$\begin{aligned} (xy > 4 \wedge yz^2 \le 4) & \mapsto (\textrm{True} \wedge \textrm{True}), \\ (\lnot a \vee x < -2) & \mapsto (\textrm{True} \vee \textrm{False}), \\ (y > 0 \vee x^2 z + y = 3) & \mapsto (\textrm{True} \vee \textrm{Unknown}), \\ (a \vee x^2 \ge 4 \vee y > 5) & \mapsto (\textrm{False} \vee \textrm{Unknown} \vee \textrm{Unknown}). \end{aligned}$$

So, the task after simplification is:

$$\begin{aligned} \underbrace{(x^2 \ge 4 \vee y > 5)}_{\text {Reduced Clauses}} \wedge \underbrace{(\lnot a \wedge x \in (1, 4) \wedge y \in (1, \infty )\wedge z \in (-2, 2))}_{\text {Feasible Domain of Variables}} \wedge \underbrace{(xy > 4 \wedge yz^2 \le 4)}_{\text {Propagated Literals}}. \end{aligned}$$

5 Evaluation

5.1 Evaluation Preliminaries

In this work, we use our method to improve SMT solvers and conduct extensive experiments to evaluate the method’s effectiveness. This subsection introduces the experiment setup, including implementation, benchmarks, base solvers, running environment, and reporting methodology.

Implementation: The partitioner is efficiently developed from the developing module of Z3 named “subpaving” in C++. We refined the code to support our variable-level partitioning and constraint propagation needs and fixed several bugs. Moreover, we implemented targeted adaptations and performance enhancements within the “subpaving” module to improve the capability of our method. The master is implemented by Python for task management and scheduling in parallel solving.

Benchmarks: The experiments are carried out on four non-incremental arithmetic benchmarks from SMT-LIB: 1753 instances from QF_LRA, 13226 instances from QF_LIA, 12134 instances from QF_NRA, and 25358 instances from QF_NIA.

Base Solvers: For the foundation of our research, we choose three state-of-the-art SMT solvers as the base solvers (i.e., the worker threads) for our studies, including CVC5 (v1.0.8) [2], OpenSMT2 (v2.5.2) [18], and Z3 (v4.12.1) [25]. CVC5 and Z3 have persistently demonstrated superior performance across numerous theories and tracks in the SMT-COMP over several consecutive years. Conversely, OpenSMT2 is a solver specifically oriented towards parallel and distributed solving and exhibits commendable efficacy in the theories of linear arithmetic. CVC5 and Z3 support all theories in benchmarks, and OpenSMT2 supports QF_LRA and QF_LIA of the four theories. As for comparisons, we also test the previous parallel versions of these solvers.

Experiment Setup: All experiments are conducted on servers running Ubuntu 20.04.4 LTS, each with 1T RAM and two AMD EPYC 7763 CPUs with 64 cores per CPU. Each solver performed one run for each instance with a cutoff time of 1200 CPU seconds. For each solver for each benchmark, we report our number of solved SAT/UNSAT instances, total failed instances, and total solved instances, denoted as “SAT”, “UNSAT”, “Failed”, and “Solved”. Furthermore, we present the penalized run-time “PAR-2”, as used in SAT Competitions, where the run time of a failed run is penalized as twice the cutoff time, and the PAR-2 score improvement “Improve” compared to sequential solving. We use the PAR-2 score because it provides a single metric that incorporates both run time and the number of benchmarks solved. The CPU time consumed by the experiments is more than 20 CPU years. Our experiments primarily use sequential and parallel solving with 8 and 16 cores. Although our method supports parallelism for an arbitrary number of cores, we elected to use powers of 2 for our core number. This decision was made considering that some existing comparative strategies like cube-and-conquer can only accommodate parallelism for cores numbering to the powers of 2. Moreover, the choice of the core number aims to balance the advantage of the dynamic parallel framework against the practical time and equipment constraints. The framework’s benefits might not be adequately exhibited in cases involving fewer cores, and employing a more significant number of cores may result in an unacceptable CPU time cost.

At last, we have provided our solver, evaluation scripts, and related experimental results in a GitHub repositoryFootnote 2. For those interested, it is possible to utilize our solver and explore the experimental details further.

5.2 Comparison to Sequential Solving

This part of the evaluation focuses on testing the effectiveness of our variable-level partitioning in augmenting and accelerating the solving capabilities of different sequential solvers across various theories. The notation Solver(S) means sequential solving of the SMT solver “Solver”, and Solver(AP-pX) is the notation for employing “Solver” within our method AP with X cores.

Table 2. Comparison to sequential solving in benchmarks of linear theories, where the solvers employing our partitioning method are denoted with “AP-p”, followed by the number of cores.
Table 3. Comparison to sequential solving in benchmarks of non-linear theories.

The results from different base solvers and their performance within our method are summarized in Table 2 and Table 3. From sequential to parallel solving with 8 cores by applying our parallel method for all base solvers, the number of solved instances is increased, and the PAR-2 scores are significantly improved. Specifically, the parallel solvers with 8 cores solve 25.3, 301, 353, and 2816 more instances compared to sequential versions on average for QF_LRA(1753), QF_LIA(13226), QF_NRA(12134) and QF_NIA(25358) theories respectively. Moreover, the PAR-2 scores are improved by 22.4%, 15.7%, 35.0%, and 32.4% on average. Overall, our parallel method with 8 cores solves 1211 additional instances (out of 6247) that any single solver could not solve without our partitioner, improving the solving ability essentially.

From 8 to 32 cores parallel solving results with our proposed method, we observed significant performance improvement. This preliminary evidence underscores the promising potential of our approach in terms of scalability. Time and computational resources, unavoidable constraints in our parallel experimentation, have limited us to exploring up to 32 cores. Nevertheless, our current experimental results do not display a solving performance saturation with the present number of cores. Extending the parallel cores beyond 32 would continue to improve the solving performance. In other words, the improvement is even worth the resource usage cost beyond 32 cores.

Further, we discover that the improvement across distinct theories varied considerably. The improvements are more significant in non-linear arithmetic instances than linear ones. This result aligns with the intention of ICP, which is mainly designed to speed up solving non-linear arithmetic constraints. For linear instances, the capacity of SAT instances is predominantly enhanced. For non-linear instances, the improvement is also noticeable in UNSAT instances. From the standpoint of solvers, improvements using our method are evenly distributed in both instances in Z3, whereas in CVC5 and OpenSMT2, the enhancements are primarily evident within SAT instances.

5.3 Comparison to State-of-the-art Partitioning Strategies

Table 4. Comparison to state-of-the-art partitioning strategy.

We compare our variable-level partitioning strategy to state-of-the-art partitioning strategies evaluated in [32]: including decision-cube, the best strategy in CVC5, and scattering, the best strategy in SMTS [1, 24], which is the parallel version of OpenSMT2. The notation CVC5(pX) refers to the decision-cube strategy in CVC5 with X cores. OpenSMT2(pX) stands for the scattering strategy in OpenSMT2 with X cores. Solver(AP-pX) is the notation for employing Solver within our method AP with X cores.

Our evaluation compares the results of our method with both best competitor strategies in Table 4, ensuring fairness by comparing both strategies with our strategy based on the same SMT solver. Considering that Z3 lacks an appropriate divide-and-conquer parallel strategy for testing, we have independently demonstrated the performance of employing Z3 within our method in the table for reference.

Overall, our method is obviously the best, outperforming all the other strategies, only except for 8-core execution on the QF_LIA theory. Our method performs particularly well for non-linear theories and shows significant improvements over the competitors. For example, when using 8 cores, CVC5 with our method solved 216 (out of 777 unsolved) and 1551 (out of 8375 unsolved) more instances for QF_NRA and QF_NIA theories than decision-cube strategy, respectively.

For the QF_LIA theory, our method shows competitive and complementary to OpenSMT2. This is indicated by 51 instances where OpenSMT2 fails, yet our method solves quickly, and there are 181 instances where our method solves more than 10 times faster. We discover the performance of OpenSMT2 degrades a little from 8 to 16 cores for the QF_LRA and QF_LIA benchmarks. A possible explanation is the lack of selection for parallel-friendly solving instances from the benchmark, resulting in performance decrement for specific instances.

5.4 Improvement on Pure-Conjunction Formulas

Lastly, we focus on pure-conjunction instances to empirically validate the effectiveness of our proposed variable-level partitioning strategy. Notably, the term-level partitioning strategies fall short of partition generation and parallel acceleration in these instances. We filter out instances where, after Z3 preprocessing and Boolean constraint propagation, all abstract Boolean variables have already been assigned. In other words, the propositional engine in SMT makes no or almost no decisions for the original formula during sequential solving.

For more details, pure-conjunction instances account for 22.8% (11957 out of 52471) of all across arithmetic theories, 19.2% (337 out of 1753) within QF_LRA, 30.7% (4066 out of 13226) with QF_LIA, a significant 49.7% (6034 out of 12134) within QF_NRA, and 6.0% (1520 out of 25358) within QF_NIA. Our forthcoming experiments compare the run time between our variable-level partitioning strategy and state-of-the-art strategies in CVC5 and OpenSMT2 on pure-conjunction instances.

Fig. 3.
figure 3

Run time comparison with partitioning strategies in pure-conjunction instances.

As displayed in Fig. 3(a), a comparison of run time for linear instances between our strategy and the best strategy in CVC5 indicates differences. We observe a speed-up of over 10X (10 times faster) on 92 instances and a slowdown of over 0.1X on 12 instances compared to our counterpart. For comparison with OpenSMT2, as presented in Fig. 3(b), our partitioning strategy is competitive with OpenSMT2 on linear instances. Our solution exceeds 10 times faster and slower in 152 and 12, respectively. Finally, our method significantly improves non-linear instances against CVC5, shown in Fig. 3(c). Our method succeeds in 265 instances where CVC5 fails, while it fails in only 11 instances where CVC5 succeeds. Performance improvement is impressive: the number of instances where the solving speed exceeds 100X, 10X, 0.1X, and 0.01X are 322, 533, 91, and 32, respectively.

In summary, the evaluation results confirm the potential of the variable-level partitioning strategy in pure-conjunction instances. Beyond pure-conjunction instances, our method also stands out in almost-pure-conjunction instances, which occupy a higher percentage in benchmarks.

6 Conclusion and Future Work

In this paper, we proposed the first variable-level partitioning strategy for parallel solving in arithmetic theories of SMT. Two main ideas include a dynamic parallel framework and a variable-level partitioning strategy with enhanced constraint propagation. We developed parallel solvers of 3 leading SMT solvers using our partitioning strategy. Extensive experiments showed that our variable-level partitioning strategy outperformed the best divide-and-conquer parallel strategies on all arithmetic theories and was significantly better on nonlinear arithmetic theories.

Our strategy can be extended to other theories within SMT with the need for customized specifics. Further research remains necessary to devise a comprehensive, variable-level partitioning strategy applicable across various SMT theories. Besides, it is interesting to combine our method with term-level partitioning to improve the performance of divide-and-conquer parallel further.