figure a
figure b

1 Introduction

The B-method [1, 2] is based on predicate logic, arithmetic and set theory. Particularly when using Unicode symbols, its core syntax is very close to standard mathematical notation. The ProB validation tool [29] for B can bring this mathematics to life [28]. We have been using this fact to develop a variety of interactive teaching materials for an undergraduate theoretical computer science course, in particular via Jupyter notebooks [15]. In many cases the mathematical formulas in the course script [36] are valid B formulas or need only minor adaptations. These notebooksFootnote 1 cover topics like finite automata, automata determinatisation and minimisation, parsing algorithms, Turing machines, various Gödelisations and conversions of grammars to automata models and back.

In the summer of 2023 we have covered for the first time Cook’s theorem [8] in more detail and developed an accompanying B model for it. Cook’s theorem states that SAT solving (satisfiability of propositional logic formulas) is NP-complete. The proof shows that a successful run of a non-deterministic Turing machine (solving a given NP-problem) can be modelled as the solution of a propositional logic formula. The translation rules to SAT in [8] and [36] are written as quantified logic formulas (using universal and existential quantification over time points, tape contents and states of the Turing machine). These formulas can be encoded elegantly in B, and we could take them almost verbatim from the script of the lecture [36] to develop a B encoding of the proof.Footnote 2 As we will see in Sect. 2, one can thus visualise solutions of the translated SAT problem, giving students a better intuition about Cook’s theorem. In the process of formalisation, we also found a few subtle mistakes in the script [36].

More importantly, however, is the realisation that this mathematical style of describing a SAT problem is useful for other, new applications of formal methods we were working on (from biology, data mining and railways). This led to the development of the new solving backend B2Sat. B2Sat intertwines the solving of high-level, higher order B predicates with solving “bare-metal” SAT problems. When applied to the B encoding of Cook’s theorem it creates and solves exactly the SAT problem described in [8, 36] and translates the SAT solution back to B.

In the rest of the paper we first show more details about Cook’s theorem and how to describe the underlying SAT problems in predicate logic. We then describe the technique and implementation of our new solving backend B2Sat. We show that B2Sat has applications for complex constraint solving and optimisation tasks, enabling convenient modelling in B, TLA\(^{+}\) or Z and effective solving in SAT. For some applications at least, it is considerably more efficient than existing solvers for B, TLA\(^{+}\) or Z.

2 Cook’s Theorem in B

Cook’s theorem [8] states that SAT solving is NP-complete. The theorem is still important 50 years after, see, e.g., [34].

Recall that a set A of words is in NP if there is a non-deterministic Turing machine \(M_{A}\) which accepts A in polynomial time. The proof of NP-hardness in [8] shows that accepting computations of \(M_{A}\) for a particular input x correspond to solutions of a propositional logic formula \(F_{x}\). The formula \(F_{x}\) is derived from x and the Turing machine \(M_{A}\). Nowadays we would call this process bounded model checking of the Turing machine \(M_{A}\). Indeed, the number of computation steps of the Turing machine is bounded by some polynomial \(\pi _{A}\) over the size of the input. As such, the reachable cells of the Turing machine’s infinite tape are also bounded for a given input x, making it possible to encode the whole computation as a propositional logic formula.

We now present a B encoding of the core of the proof, based on encoding the translation rules deriving the SAT formula \(F_{x}\) from \(M_{A}\) and the input x. We here assume that \(M_{A}\) is a Turing machine with one tape. The translation rules to SAT are represented as predicate logic formulas; these formulas can be encoded elegantly in B, and are taken almost verbatim from [36].

A Turing machine consists of a working alphabet \(\varGamma \), containing the input alphabet \(\varSigma \) and the blank symbol, a set of states S, an initial state \(z_{0}\), and a set \(F\subseteq S\) of final states. In addition we have the transition relation \(\delta \in (S \times \varGamma ) \leftrightarrow (S \times \varGamma \times Dir)\), where \(Dir=\{L,R,N\}\) is the set of possible head movements (left, right, neutral). Here, \((z,y) \mapsto (z',y',d) \in \delta \) means that if the machine is in state \(z\in S\) and the tape contains the symbol \(y\in \varGamma \) at the head position, then the machine can change its state to \(z'\), write \(y'\) at the current tape position (overwriting y) and move its head according to d. Note that we model \(\delta \) here as a relation, and the Turing machine can be non-deterministic.

We now assume that M accepts a set \(A\in NP\) (i.e., a problem in NP). Hence the number of steps for accepting an input \(x\in A\) is bounded by some polynomial \(\pi \). Hence, we model the set of time points \( Time = 0 \mathbin {..}\pi (n)\) where n is the size of the input x. In that time span the Turing machine can move only a bounded number of steps left or right, hence we have a set of reachable tape positions \( Pos = -\pi (n) \mathbin {..}\pi (n)\)

We can specify the set of valid computation paths of the Turing machine which accept x by the formula \(F_{x}\). The propositional logic variables of \(F_{x}\) depend on the size of the input x and are modelled as these three total functions in B:

  • \(state \in Time \times States \rightarrow BOOL \), encoding the state of the Turing machine at each time point \(t\in Time \),

  • \(head \in Time \times Pos \rightarrow BOOL \), encoding the position of the head on the tape at each time point,

  • \(tape \in Time \times Pos \times \varSigma \rightarrow BOOL \), encoding the contents of the tape at each time point.

The formula \(F_{x}\) is a conjunction \(S \wedge K \wedge U1 \wedge U2 \wedge E\) of the following five subformulas, which we partially describe below:

  1. S

    The initial condition specifying that \(head(0,0)= TRUE \), \(state(0,z0) = TRUE \) and the intial tape contents.

  2. K

    A correctness condition stating that the Turing machine

    • can only be in a single state:

      \(\forall t.(t\in Time \Rightarrow card(\{s\mid state(t,s)= TRUE \})=1)\),

    • have a single head position and symbol at every tape position.

  3. U1

    Updating the state, head position and the tape’s contents at the head.

  4. U2

    the Frame axiom, stating that tape contents not at the head remain unchanged: \(\forall (t,i,a).(t\in Time ^{+}\wedge i\in Pos \wedge a\in \varGamma ) \Rightarrow ((head(t,i) = FALSE \wedge tape(t,i,a)= TRUE ) \Rightarrow tape(t+1,i,a)= TRUE ) ) )\) where \( Time ^{+}\) is all but the last time point.

  5. E

    ensuring we reach a final state:

    \(\exists s.(s\in Final \wedge state(\pi (n),s)= TRUE )\).

Parts of the B model can also be found in Fig. 2. U1 is the most complicated formula, and was not written out in [8]. U1 contained four mistakes in [36], which were undetected for at least 15 years. E.g., [36] did not ensure that we do not step outside the set of modelled positions.

We can now bring this formula to life by letting ProB find solutions for state, head and tape which satisfy \(F_{x}\) for a particular Turing machine and a particular input. In Fig. 1 we show the graphical renderingFootnote 3 of one such solution for a Turing machine with 4 statesFootnote 4 the input “100” (1 is green, 0 is red and BLANK is white) and modelling 16 computation steps (from left to right).

The B model can now be used to show the students the importance of the various subformulas of \(F_{x}\). For example, Fig. 1 on the right shows what happens when we drop the frame axiom U2, meaning that untouched tape contents can change willy-nilly at each step.

Fig. 1.
figure 1

Two solutions of \(F_{x}\), showing SAT variables for state, head and a condensed view of tape. Time t progresses from left to right in each case. The SAT problems have 1664 variables. For the right the frame axiom U2 was removed from \(F_{x}\).

As we have seen, ProB’s default solver can solve and visualise our B model in Fig. 1, and thus implicitly solve the underlying SAT problem. The underlying SAT problem in Fig. 1 has 1664 variables and a solution is found in about two seconds (when increasing solver strength preference to ensure cardinality constraints are all reified, cf. [18]). But ProB is not a dedicated SAT solver, and will certainly struggle for larger underlying SAT problems.

ProB also provides other constraint solving backends. In particular, ProB’s Kodkod backend [35] translates B models to SAT via the Kodkod library [41]. It is an ahead-of-time static translation for a first-order subset of B, which unfortunately fails here, in particular because \(\delta \) is not a binary relation. So this backend is not applicable to our encoding of Cook’s theorem. Unfortunately, we were also not able to successfully apply ProB’s Z3 backend [37] here. (We provide a detailed discussion of these backends in Sect. 6.)

So, given that our B model actually specifies the generation of a SAT formula, wouldn’t it be nice if we somehow could generate \(F_{x}\) on the fly in B and then call a dedicated SAT solver on \(F_{x}\)?

This is exactly the contribution of this paper: a technique to process the above B model by a combination of ProB’s default solver with a dedicated SAT solver. The approach enables one to use the full power of B, including higher-order functions and relations, during pre- and post-processing, while still using a propositional logic SAT solver for the core solving. Figure 2 shows parts of the B model, highlighting in green the parts that were expanded and translated for the SAT solver and in red the parts that were fully processed by ProB’s solver.

Fig. 2.
figure 2

B2Sat Translation Coverage Feedback in ProB: green parts were translated to SAT, red parts were processed by regular constraint solving

3 The B2SAT Approach

The B2Sat approach is not an ahead-of-time translation to SAT like [35] or [21], but a dynamic translation during solving with ProB’s default solver.

ProB’s default solver is written in Prolog and scales to very large and complex data values. It has good deterministic propagation and is ideal for animation and data validation [30]. The boolean part of ProB’s solver is inspired by [19, 20], but is not based on CNF and is not using watched literals. More importantly, it unfortunately has no clause learning and conflict analysis.

The essence of the B2Sat approach is to intertwine a B to CNF conversion algorithm with ProB’s constraint solver. ProB’s constraint solver thus runs both before and after a SAT solver is called on the CNF conversion. ProB’s solver can thus be used to expand quantifiers and pre-compute complex expressions, without which a SAT conversion would be impossible. The solver can also run after a SAT solution has been found, to check additional constraints, perform additional computations (e.g., for visualising the result) or drive optimisation.

The approach is depicted in Fig. 3 and consists of the following phases:

  • the deterministic propagation phase(s) of ProB’s solver: it performs deterministic propagations and can expand quantifiers and total functions. It actually consists of two phases: in the first one it tries to generate only fully known values and tries to represent known sets as AVL trees [22] for efficient lookup. The second phase is still deterministic, but can also generate partially known values (like our total functions state, head and tape).

  • a compilation phase, whereby static values are inlined. ProB’s compilation is normally used for symbolic values (like infinite functions), creating a closure where all referenced values are inlined. This closure can then be evaluated later, without having access to the original state. Here we perform the compilation explicitly to simplify the formulas, in order to enable the next phase.

  • the B to CNF conversion proper, which can translate a subset of B to propositional logic in conjunctive normal form. This phase currently supports: equalities and inequalities between boolean variables and constants, all logical connectives and some cardinality constraints (see below). Subformulas that cannot be solved are sent to ProB’s default solver and linked to the CNF via an auxiliary propositional variable.

  • solving phase, where the generated CNF is sent to an external SAT solver.

  • propagation of the SAT solution to B, by progressively “grounding” the B values and predicates linked to the propositional variables.

  • complete constraint solving, solving the pending constraints in B by now performing the regular non-deterministic propagations and enumerations of ProB. In case of failure, we backtrack and add additional SAT constraints to prevent the unfruitful solution.

Let us look how this works on this simple formula (cf. Fig. 3)

\(f \in 1\mathbin {..}n \rightarrow BOOL \) \(\wedge \) \(n = 3\) \(\wedge \) \(f(1) = TRUE \) \(\wedge \) \((\forall i . i \in 2 \mathbin {..}n \Rightarrow f(i) \ne f(i - 1))\) :

  • in the deterministic phase the value of n is set to 3 and the value of f is partially computed to the set \(\{ 1\mapsto F1, 2\mapsto F2, 3\mapsto F3\}\), where F1, F2, F3 are Prolog variables. The universal quantifier is also expanded into two conjuncts \(f(2) \ne f(1)\) and \(f(3) \ne f(2)\).

  • the remaining formula is compiled, inlining the values for f and n and pre-compiling the function lookups. This results in the formula \(F1 = TRUE \wedge F2 \ne F1 \wedge F3 \ne F2\).

  • the formula is translated into a CNF over the propositional variables F1, F2, F3 resulting in five clauses \(\{ F1, \lnot F2 \vee \lnot F1, F1 \vee F2, \lnot F3 \vee \lnot F2, F2 \vee F3\}\).

  • this CNF is sent to a SAT solver, which computes the model \(F1, \lnot F2, F3\).

  • this model is propagated to B, transforming the partial value of f into a full value \(\{ 1\mapsto TRUE , 2\mapsto FALSE , 3\mapsto TRUE \}\).

  • in this case no further constraint solving is required. For example, if we had an additional conjunct \(m = card(f \rhd \{ TRUE \})\) this would be computed in this phase, resulting in \(m=2\).

Fig. 3.
figure 3

Solving Process Illustrated on an Example

Calling the SAT Solver. To send the CNF to the SAT solver we build on the Prolog interface to MiniSat from [7]. This interface was ported to SICStus Prolog by Sebastian Krings and adapted for recent versions of the Glucose SAT solver 4.0.Footnote 5 We are also working on targeting other SAT solvers, e.g., Kissat. Note that we call the SAT solvers directly on the generated CNF, without overhead. We have also extended our Z3 interface [37] to be able to send and solve SAT formulas in CNF (rather than SMT formulas).

Cardinality Constraints. The new B2Sat backend is of particular interest for finding solutions to complex constraints. In many of these cases one wants to minimize an objective function, often in the form of minimizing or maximizing the cardinality of a set. Also in Sect. 2 we required cardinality constraints for the subformula K in Cook’s theorem.

To enable these uses of B2Sat the CNF conversion phase supports constraints of the form \(card(\{x,\ldots \mid P\}) \circ Expr\) where \(\circ \in \{\le , <, =, \ge , >\}\). For this conversion to work, we need to be able to extract a finite set of distinct candidates for the set \(\{x,\ldots \mid P\}\). This works by re-using the quantifier instantiation technique used above, expanding \(\exists x.(P)\) into a disjunction, and checking that each disjunct corresponds to a unique candidate element of the set.

For example, let us examine the formula \(f\in 1..3 \rightarrow BOOL \wedge f(1)= TRUE \wedge card(\{i|i\in 1 \mathbin {..}3 \wedge f(i)= FALSE \}) = 1\). As above, we would generate three propositional logic variables F1, F2, F3 for the contents of f. In this case quantifier expansion will create three candidate disjuncts for the set \(\{i|i\in 1 \mathbin {..}3 \wedge f(i)= FALSE \}\): \(f(1)= FALSE \), \(f(2)= FALSE \) and \(f(3)= FALSE \). This gets translated into three propositional logic literals \(\lnot F1,\lnot F2,\lnot F3\). We now have to encode that exactly one of these three literals is true, e.g., as follows in CNF: \(\{\lnot F1 \vee \lnot F2 \vee \lnot F3, F1 \vee F2, F1 \vee F3, F2 \vee F3\}\).

Once we have a list of candidates of a set S as propositional logic literals \(L_{1},\ldots ,L_{k}\), we need to encode the various cardinality constraints:

  • \(card(S)=0\) or \(card(S)<1\) or \(card(S) \le 0\) (empty set) is \(\{ \lnot L_{1}, \ldots , \lnot L_{k} \}\).

  • \(card(S)=k\) or \(card(S) \ge k\) (complete set) is simply \(\{ L_{1}, \ldots , L_{k} \}\).

  • \(card(S)<0\) or \(card(S)>k\) or similar unsatisfiable constraints: we generate a contradiction \(\{ \bot \}\).

  • \(card(S)\ge 1\) or similar (at least one): \(\{ L_{1} \vee \ldots \vee L_{k} \}\).

  • \(card(S) < k\) or similar (not complete set): \(\{ \lnot L_{1} \vee \ldots \vee \lnot L_{k} \}\).

  • for the other cases we generate a sequential counter, counting the number of true literals among \(L_{1},\ldots ,L_{k}\), as described in [23].

There are many works on how to encode cardinality constraints in SAT (e.g., [39, 44]), but thus far we have fared well with the sequential counter encoding recommended by Knuth [23].

Tooling Extensions. We have implemented several ways in ProB to interact with the new solver backend. First, in the ProB console you have the new commands :sat, :sat-z3, :sat-double-check and :sat-z3-double-check. The first can be used to solve a predicate with Glucose, the second with Z3 [9] as SAT solver. The last two commands double-check the solution using ProB’s default solver. These commands are used in ProB’s integration tests.

Here is one of our examples in the console (started via probcli –repl):

figure c

It is possible to use the command +:sat #file=FILE+ to load the predicate from a file. We have also made our solver available within ProB’s Jupyter kernel [15], as the following screenshot shows:

figure d

The backend can also be used to solve the properties (aka axioms) of B and Event-B models by setting the SOLVER_FOR_PROPERTIES preference. This preference can currently take the values: prob, sat, sat-z3, kodkod, z3, cdclt.Footnote 6 Here sat will use our new B2Sat backend using the default SAT solver (glucose) and sat-z3 will use B2Sat with Z3 as SAT solver. This feature is also available for the other state-based formalisms supported by ProB, e.g., TLA\(^{+}\) and Z.Footnote 7

4 Applications and Experiments

Dominating Sets. Dominating sets have various practical applications. In our context, they are relevant in biological models of leaves (e.g., [40]) as well as for data generation in railway topologies. Given a graph \(g\subset V\times V\) over set of nodes V, a dominating set is a set of nodes \(D \subseteq V\) such that every node is either in D or has a neighbour in D: \(\forall n.(n\in V \setminus D \Rightarrow \exists d.(d\in D \wedge n\mapsto d \in g))\).

We can encode the above formula for B2Sat. Currently, we still need to rewrite our set D as a function to BOOL (in future we will remove this restriction). To find a minimal dominating set we can add additional constraints \(card(\{d\mid d\in V \wedge D(d)= TRUE \}) < b\), trying to find smaller and smaller solutions until we have found a minimal dominating set:

figure e

For minimisation or optimisation one often resorts to cardinality constraints. Here the minimisation was done by “hand”, but ProB also has a MINIMIZE predicate which can be used to automate the process.

The above constraints can be solved with the default solver of ProB. However, for bigger graphs the problem gets exponentially more complex (finding a minimal dominating set is an NP-complete problem). The left of Fig. 4 shows a minimal dominating set computed by B2Sat for a larger graph, representing a leaf, where the default solver’s runtime becomes intractable. In practical applications one often needs variations or extensions of the dominating sets concept:

  1. 1.

    instead of considering only the direct neighbours, one can go k hops before reaching a dominating set element. D is then called a k-hop dominating set.

  2. 2.

    one may require additional properties of D, e.g., that it be connected. From a connected dominating set one can extract a spanning tree.

The first one is very easy to express in B: simply apply the iterate operator on g before applying the universal quantifier. Connectedness can also be easily expressed. The right of Fig. 4 shows a minimal 1-hop connected dominating set for the same graph. This example stems from a biological application, to study suitable vein structures of leaves.

Fig. 4.
figure 4

Non-connected and connected 1-hop minimal dominating set. Green nodes are part of the dominating set. The graph represents a biological leaf. (Color figure online)

Similar issues can also appear in railway applications, e.g., for placing balises on a track to ensure certain safety criteria. As a proof of concept, we have solved an artificial problem on real data. We have used ProB to read in RailML data of the Oslo main station. The import uses the expressivity of B, also performing subsidiary rule validation [16]. The task was to place balises on the track which ensures that a train must encounter a balise at least every three blocks. (A very recent article [31] discusses related data generation problems for railML.)

With B2Sat we could produce a more efficient version of our time-tabling tool [38]. We hope that our backend is also applicable to other verification tasks, e.g., for interlockings. This is related to techniques like Prover iLock [3, 4] or HLL [5, 13]. Our hope is to make such verification available while using the full expressive power of B. We also want to address verification of B hardware models [12, 42], in particular the CLEARSY safety platform [27] (LChip).

Crowded Chessboard. The crowded chessboard is a more than 100 year old problem from [10]. The purpose is to place a maximal number of chess pieces on a board, so that no piece attacks a piece of the same kind. In [26] we tried various approaches to solve the problem. In particular we developed a precursor of the present work, integrating ProB with Kodkod differently than in [35]. While better than the SMT and CLP(FD) encodings in [26], the approach was not very user-friendly (requiring explicit annotations) and considerably slower than B2Sat: the solving time for n = 8 is 19 s compared to 0.5 s with B2Sat. Note that our encoding is fully readable and is similar in performance to the direct SAT encoding from [26], while we can easily inspect, double-check and visualise the solution using ProB.

5 Experiments

Below we conducted an empirical evaluation of B2Sat.Footnote 8 Table 1 contains the run times of our new backend. It shows times for pre-processing (quantifier expansion) and conversion to CNF (second column), times for SAT solving proper (with Glucose, column 5), and total solving times (including post-processing and times from columns 2 and 5).

All benchmarks were run on a Macbook Air with M2 processor, 24 GB RAM and version 1.13.1-beta1 of ProB compiled with SICStus Prolog 4.8.0. We used Z3 in version 4.13.0.0 and Glucose in version 4.0. For the Kodkod backend we also used Glucose as SAT solver to enable a fair comparison. All times are wall times in milliseconds (ms) and the timeout was set at 20 s (but B2SAT does not yet support time-outs during the SAT solver runs).

The benchmarks contain the examples from above: bounded model checking of the Turing machine from Fig. 1 for 20 steps (TuringMachine_Cook_20), the crowded chessboard puzzle for an 8\(\times \)8 chessboard (CrowdedChessBoard), dominating sets for leaves (DominatingSet_Middle), and balise placement on the Oslo main station. We also included three pure SAT problems (blocksworld and uuf) in B, to measure the overhead when writing SAT problems in B rather than in CNF format. We have also included some benchmarks from the IDP-Z3 [6] system, which we translated to B: queens, transitive closure, and pigeon hole. The translation was straightforward. These IDP models use quantifiers instead of natural B operators (e.g., perm for queens or closure1 for the transitive closure), which would be considerably faster in B. Still, the models are a good way to evaluate B2Sat, whose results are very good compared to the results on the IDP-Z3 Github site.Footnote 9

Table 1. B2Sat Backend of ProB: (1) B to CNF Pre-Processing, (2) Glucose SAT Solving and (3) Total Walltime including Post-Processing
Table 2. Total Walltime for Solving with various Backends of ProB. stands for unknown, \(\checkmark \) for the correct result.

Other Backends. The comparison with other backends of ProB are in Table 2. As one can see, the Kodkod backend [35] was only applicable to 5 of the 14 examples. Some of the examples can be rewritten to make the backend applicable (see below). When applicable, however, it is often considerably slower than B2Sat, including for the three SAT problems.

The default CLP(FD) backend of ProB can solve 7 out of the 14 examples. The Z3 SMT backend can only solve 3 of the 14 examples; the treatment of quantifiers and cardinality constraints is a weak spot of this backend. Z3 can still be very useful as a SAT solver, as we explain below.

Other SAT Solvers. To keep the tables readable, we have not included runtimes when using Z3 instead of Glucose for B2Sat in Fig 1. For most smaller examples, Glucose is much faster than Z3 (e.g., 21 ms vs. 231 ms for pigeon_30). This is probably because the Prolog SAT interface based on [7] is faster than Z3’s C++ interface. For complex examples, however, Z3 can be a useful alternative SAT solver. For example, for OSLO_dom_edge_lt_181 it is four times faster than Glucose. It is good to have a variety of SAT solvers at our disposal; especially for optimisation, where solving time increases when we approach the optimum.

In summary, the tables show that for the benchmarks above, B2Sat is a considerable improvement over existing backends, and opens up new application areas for B. There is still a performance bottleneck in the compilation phase, as can be seen in the queens, transitive closure and Turing examples in Table 1. The overhead is due to partially instantiated data values having linear rather than logarithmic access in ProB. We hope to reduce this overhead considerably in the future, e.g., by also using AVL trees for partially instantiated values.

Tables 1 and 2 are biased: we only study examples which can be solved by B2Sat. Also, some benchmarks can be rewritten for Kodkod by replacing strings with enumerated sets, rewriting functions to predicates, or adding additional constraints to make the bounds finite or remove higher-order constructs by hand. For example, by rewriting the pigeon_30 example and removing the higher-order function it can be solved in 596 ms. The purpose of the experiments is to show that there are applications where B2Sat is very effective; it is not to study the performance for a representative set of benchmark programs.

6 Related and Future Work

Other B Backends. We can compare B2Sat with the backends from Table 2:

  • ProB’s default solver is based on constraint logic programming. As mentioned, it scales to very large and complex data values and has been used in industry for B specifications with up to 9 million lines of B. It has good deterministic propagation, can deal symbolically with infinite values and is well suited for animation and data validation. The boolean solver was inspired by [19, 20], but without watched literals. Also, there is no clause learning nor conflict analysis.

  • The CDCL(T) backend [37] is a Prolog SMT-style solver built on top of ProB’s default solver. It does have clause learning and conflict analysis, but its performance as a SAT solver is far from state-of-the-art SAT solvers. It is useful for symbolic verification tasks, but as Table 2 shows not for the constraint solving and optimisation tasks here.

  • The Z3 SMT backend [37] is based on a translation of B to SMT-LIB. It works better with unsatisfiable formulas than for model finding of satisfiable ones. The backend is good for symbolic verification tasks, but has still considerable restrictions (cardinality, quantifiers, finite B values often get translated to infinite ones in SMT-LIB, ...). As such it is not suited as an animation engine and as Table 2 shows not for the benchmarks here. The Apalache symbolic model checker [25] for TLA\(^{+}\) uses Z3 [9] as backend, but with an encoding tailored for finite sets. As such it is better suited for model finding. Indeed, we were able to solve a small dominating set example in TLA\(^{+}\) but not DominatingSet_Middle_lt13 from Table 1.Footnote 10

  • The Kodkod backend [35] uses the Kodkod library [41] to perform an indirect translation of B to SAT (via the relational logic API of [41]). When applicable, it can be very effective and much more efficient than ProB’s default solver. It has, however, limited applicability (no sets of sets, no higher-order relations or functions, restrictions to binary relations).

The Kodkod backend [35] is the closest to our approach, and we want to clarify the important differences:

  • [35] is a static ahead-of-time translation on the AST (abstract syntax tree). As such there is no expansion prior to translation, meaning we cannot use many of B’s nice features (higher-order, ...) to set-up the constraints. B2Sat is dynamic (just-in-time, e.g., after quantifier expansions) and can process a mixture of AST and partially instantiated values.

  • [35] can translate integers and more operators to SAT than B2Sat.

  • There is an overhead in the generated SAT problem with [35] (see Table 2 for pure SAT problems).

  • there is an issue with integer overflows in Kodkod, which is not easy to solve (meaning the current backend [35] is not sound for cardinality constraints or some integer membership constraints).

  • [35] cannot deal with higher-order relations, nor with ternary relations (see pigeon_30 in Table 2).

Other Languages Translating to SAT. The Alloy analyzer [21] uses the Kodkod library, and is again a static ahead-of-time translation to SAT. Arby [33] is an embedding of Alloy into Ruby. One could thus write an Arby program to expand the Turing machine from Sect. 2 into an Alloy model, which in turn would get translated to SAT. Our approach is to use logic, mathematics and the B language to set up SAT constraints (rather than a separate scripting language).

Answer Set Programming (ASP) [11] starts off from a logic program, which usually (see [14]) gets transformed via a grounding phase to a SAT problem. ASP builds on a non-monotonic semantics, while our approach is rooted in mathematical logic with classical monotonic negation and with access to theorem provers.

The Picat [45] logic-based language one can use SAT solvers for constraint solving. A related approach is IDP-Z3 [32], based on inductive definitions rather than logic programs. IDP-Z3 is a re-implementation of [43]. We have used some IDP-Z3 benchmarks above. SMT-LIB itself, in particular when using eager solving, is also related to B2Sat. Apart from expressivity, a major difference is that in B2Sat a separate constraint-based solver is driving the translation to SAT. This increases the specifications that can be handled and the pre-processing that can take place (see also [17]). Indeed, in the crowded chessboard example the direct SMT-LIB solutions were not effective [26] (in contrast to B2Sat).

Future Work. We wish to extend the subset of B which can be translated to SAT (integer operators, finite-domain variables, sets,...). We also want to keep track when propagation of a SAT solution fails in ProB, to then compute the unsat core to add it as a learned clause. We plan to target other SAT solvers, like Kissat. and would like to target SMT rather than SAT. As we saw in Table 2, the current Z3 backend does not work well for model finding or optimisation. By generating SMT-LIB without quantifiers this could be much improved.

In the future we wish to make the optimisation process more efficient. ProB already has the functions MAXIMIZE and MINIMIZE, but we wish to use incremental SAT solving and other algorithms from the SAT community [24].

In summary, we have presented a new bare metal SAT backend for B. We have shown how it can be applied almost out of the box to a mathematical rendering of Cook’s theorem. With our new backend one can use the full power of B to pre- and post-process higher-order data and properties, solve and optimise complex problems and use the B tooling infrastructure to visualise solutions and double check solutions with other backends. We hope that this leads to readable, maintainable and efficient SAT applications with state-based formal methods like B, Z or TLA+. While this approach is certainly not a universal technique, it enables a wide variety of new applications: graph matching for machine learning, dominating sets for biological applications, hardware modelling and verification, data generation for industrial railway applications, bounded model checking for railway interlockings, and many more.