Multicore OnTheFly Saturation
 1 Citations
 4.3k Downloads
Abstract
Saturation is an efficient exploration order for computing the set of reachable states symbolically. Attempts to parallelize saturation have so far resulted in limited speedup. We demonstrate for the first time that onthefly symbolic saturation can be successfully parallelized at a large scale. To this end, we implemented saturation in Sylvan’s multicore decision diagrams used by the LTSmin model checker.
We report extensive experiments, measuring the speedup of parallel symbolic saturation on a 48core machine, and compare it with the speedup of parallel symbolic BFS and chaining. We find that the parallel scalability varies from quite modest to excellent. We also compared the speedup of onthefly saturation and saturation for prelearned transition relations. Finally, we compared our implementation of saturation with the existing sequential implementation based on Meddly.
The empirical evaluation uses Petri nets from the model checking contest, but thanks to the architecture of LTSmin, parallel onthefly saturation is now available to multiple specification languages. Data or code related to this paper is available at: [34].
1 Introduction
Model checking is an exhaustive algorithm to verify that a finite model of a concurrent system satisfies certain temporal properties. The main challenge is to handle the large state space, resulting from the combination of parallel components. Symbolic model checking exploits regularities in the set of reachable states, by storing this set concisely in a decision diagram. In asynchronous systems, transitions have locality, i.e. they affect only a small part of the state vector. This locality is exploited in the saturation strategy, which is probably the most efficient strategy to compute the set of reachable states.
In this paper, we investigate the efficiency and speedup of a new parallel implementation of saturation, aiming at a multicore, sharedmemory implementation. The implementation is carried out in the parallel decision diagram framework Sylvan [16], in the languageindependent model checker LTSmin [22]. We empirically evaluate the speedup of parallel saturation on Petri nets from the Model Checking Contest [24], running the algorithm on up to 48 cores.
1.1 Related Work
The saturation strategy has been developed and improved by Ciardo et al. We refer to [13] for an extensive description of the algorithm. Saturation derives its efficiency from firing all local transitions that apply at a certain level of the decision diagram, before proceeding to the next higher level. An important step in the development of the saturation algorithm allows onthefly generation of the transition relations, without knowing the cardinality of the state variable domains in advance [12]. This is essential to implement saturation in LTSmin, which is based on the PINS interface to discover transitions onthefly.
Since saturation obtains its efficiency from a restrictive firing order, it seems inherently sequential. Yet the problem of parallelising saturation has been studied intensively. The first attempt, Saturation NOW [9], used a network of PCs. This version could exploit the collective memory of all PCs, but due to the sequential procedure, no speedup was achieved. By firing local transitions speculatively (but with care to avoid memory waste), some speedup has been achieved [10]. More relevant to our work is the parallelisation of saturation for a shared memory architecture [20]. The authors used CILK to schedule parallel work originating from firing multiple transitions at the same level. They reported some speedup on a dualcore machine, at the expense of a serious memory increase. Their method also required to precompute the transition relation. An improvement of the parallel synchronisation mechanism was provided in [31]. They reported a parallel speedup of 2\(\times \) on 4 CPUs. Moreover, their implementation supports learning the transition relation onthefly. Still, the successful parallelisation of saturation remained widely open, as indicated by Ciardo [14]: “Parallel symbolic statespace exploration is difficult, but what is the alternative?”
For an extensive overview of parallel decision diagrams on various hardware architectures, see [15]. Here we mention some other approaches to parallel symbolic model checking, different from saturation for reachability analysis. First, Grumberg and her team [21] designed a parallel BDD package based on vertical partitioning. Each worker maintains its own subBDD. Workers exchange BDD nodes over the network. They reported some speedup on 32 PCs for BDD based model checking under the BFS strategy. The Sylvan [16] multicore decision diagram package supports symbolic onthefly reachability analysis, as well as bisimulation minimisation [17]. Oortwijn [28] experimented with a heterogeneous distributed/multicore architecture, by porting Sylvan’s architecture to RDMA over MPI, running symbolic reachability on 480 cores spread over 32 PCs and reporting speedups of BFS symbolic reachability up to 50. Finally, we mention some applications of saturation beyond reachability, such as model checking CTL [32] and detecting strongly connected components to detect fair cycles [33].
1.2 Contribution
Here we show that implementing saturation on top of the multicore decision diagram framework Sylvan [16] yields a considerable speedup in a sharedmemory setting of up to 32.5\(\times \) on 48 cores with prelearned transition relations, and 52.2\(\times \) with onthefly transition learning.
By design decision, our implementation reuses several features provided by Sylvan, such as: its own finegrained, workstealing framework Lace [18], its implementation of both BDDs (Binary Decision Diagrams) and LDDs (a Listimplementation of Multiway Decision Diagrams), its concurrent unique table and operations cache, and finally, its parallel operations like set union and relational product. As a consequence, the pseudocode of the algorithm and additional code for saturation is quite small, and orthogonal to other BDD features. To improve orthogonality with the existing decision diagrams, we deviated from the standard presentation of saturation [13]: we never update BDD nodes in situ, and we eliminated the mutual recursion between saturation and the BDD operations for relational product to fire transitions.
The implementation is available in the opensource highperformance model checking tool LTSmin [22], with its languageagnostic interface, Partitioned NextState Interface (PINS) [5, 22, 25]. Here, a specification basically provides a nextstate function equipped with dependency information, from which LTSmin can derive locality information. We fully support the flexible method of learning the transition relation onthefly during saturation [12]. As a consequence, our contribution extends the tool LTSmin with saturation for various specification languages, like Promela, DVE, Petri nets, mCRL2, and languages supported by the ProB model checker. See Sect. 4 on how to use saturation in LTSmin.

The performance of our parallel algorithm with one worker, compared to a stateofthe art sequential implementation of saturation in Meddly [4].

The parallel speedup of our algorithm on 16 cores, and for specific examples up to 48 cores.

The efficiency and speedup of saturation compared to the BFS and chaining strategies for reachability analysis.

The effect of choosing Binary Decision Diagrams or List Decision Diagrams.

The effect of choosing Sloan or Force to compute static variable orders.
2 Preliminaries
This paper proposes an algorithm for decision diagrams to perform the fixed point application of multiple transition relations according to the saturation strategy, combined with onthefly transition learning as implemented in LTSmin. We briefly review these concepts in the following.
2.1 Partitioned Transition Systems
In this paper, we evaluate multicore saturation using Petri nets. Figure 1 shows an example of a (safe) Petri net. We show its initial marking, which is the initial state. A Petri net transition can fire if there is a token in each of its source places. On firing, these tokens are consumed and tokens in each target place are generated. For example, \(t_1\) will produce one token in both \(p_2\) and \(p_5\), if there is a token in \(p_4\). Transition \(t_6\) requires a token in both \(p_3\) and \(p_1\) to fire. The markings of this Petri net form the states of the corresponding TS, so here \(\vert S \vert = 2^5 = 32\). From the initial marking shown, four more markings are reachable, connected by 10 enabled transition firings. This means \(\vert R \vert = 5\), and \(\vert {\rightarrow }\vert = 10\).
Notice that transitions in Petri nets are quite local; transitions consume from, and produce into relatively few places. The firing of a Petri net transition is called an event and the number of involved places is known as the degree of event locality. This notion is easily defined for other asynchronous specification languages and can be computed by a simple control flow graph analysis.
2.2 Decision Diagrams
Binary decision diagrams (BDDs) are a concise and canonical representation of Boolean functions \(\mathbb {B}^N\rightarrow \mathbb {B}\) [7]. A BDD is a rooted directed acyclic graph with leaves 0 and 1. Each internal node v has a variable label \(x_i\), denoted by \(\mathrm {var}(v)\), and two outgoing edges labeled 0 and 1, denoted by \(\mathrm {low}(v)\) and \(\mathrm {high}(v)\). The efficiency of reduced, ordered BDDs is achieved by minimizing the structure with some invariants: The BDD may neither contain equivalent nodes, with the same \(\mathrm {var}(v)\), \(\mathrm {low}(v)\) and \(\mathrm {high}(v)\), nor redundant nodes, with \(\mathrm {low}(v)=high(v)\). Also, the variables must occur according to a fixed ordering along each path.
Multivalued or multiway decision diagrams (MDDs) generalize BDDs to finite domains (\(\mathbb {N}^N\rightarrow \mathbb {B}\)). Each internal MDD node with variable \(x_i\) now has \(n_i\) outgoing edges, labeled 0 to \(n_i1\). We use quasireduced MDDs with sparse nodes. In the sparse representation, values with edges to leaf 0 are skipped from MDD nodes, so outgoing edges must be explicitly labeled with remaining domain values. Contrary to BDDs, MDDs are usually “quasireduced”, meaning that variables are never skipped. In that case, the variable \(x_i\) can be derived from the depth of the MDD, so it is not stored.
A variation of MDDs are list decision diagrams (LDDs) [5, 16], where sparse MDD nodes are represented as a linked list. See Fig. 2 for two visual representations of the same LDD. Each LDD node contains a value, a “down” edge for the corresponding child, and a “right” edge pointing to the next element in the list. Each list ends with the leaf 0 and each path from the root downwards ends with the leaf 1. The values in an LDD are strictly ordered, i.e., the values must increase to the “right”.
2.3 Variable Orders and Event Locality
Event span is defined as the sum over all rows of the distance from the leftmost nonzero column to the rightmost nonzero column. The event span of Fig. 3a is 22 (\(= 4 + 2 + 2 + 5 + 5 + 4\)); the event span of Fig. 3b is 16, which is better. Optimizing the event span and thus variable order of DDs is NPcomplete [6], yet there are heuristic approaches that run in subquadratic time and provide good enough orders. Commonly used algorithms are Noack [27], Force [1] and Sloan [30]. Noack creates a permutation of variables by iteratively minimizing some objective function. The Force algorithm acts as if there are springs in between nonzeros in the dependency matrix, and tries to minimize the average tension among them. Sloan tries to minimize the profile of matrices. In short, profile is the symmetric counterpart to event span. For a more detailed overview of these algorithms see [3]. In our empirical evaluation we use both Sloan and Force, because these have been shown to give the best results [2, 26].
2.4 The Saturation Strategy
The saturation strategy for reachability analysis, i.e., the transitive closure of transition relations applied to some set of states, was first proposed by Ciardo et al. See for an overview [11, 13]. Saturation was combined with onthefly transition learning in [12]. Besides reachability, saturation has also been applied to CTL model checking [32] and in checking fairness constraints with strongly connected components [33].
Saturation is wellstudied. The core idea is to always fire enabled transitions at the lower levels in the decision diagram, before proceeding to the next level. This tends to keep the intermediate BDD sizes much smaller than for instance the breadthfirst exploration strategy. This is in particular the case for asynchronous systems, where transitions exhibit locality. There is also a major influence from the variable reordering: if the variables involved in a transition are grouped together, then this transition only affects adjacent levels in the decision diagram.
We refer to [13] for a precise description of saturation. Our implementation deviates from the standard presentation in three ways. First, we implemented saturation for LDDs and BDDs, instead of MDDs. Next, we never update nodes in the LDD forest in situ; instead, we always create new nodes. Finally, the standard representation has a mutual recursion between saturation and firing transitions. Instead, we fire transition using the existing function for relational product, which is called from our saturation algorithm. As a consequence, the extension with saturation becomes more orthogonal to the specific decision diagram implementation. We refer to Sect. 3 for a detailed description of our algorithm. We show in Sect. 5 that these design decisions do not introduce computational overhead.
3 Multicore Saturation Algorithm
To access the three elements of an LDD node x, Sylvan [16] provides the functions \(\texttt {value}(x)\), \(\texttt {down}(x)\), \(\texttt {right}(x)\). To create or retrieve a unique LDD node using the hash table, Sylvan provides \(\texttt {LookupLDDNode}(\textit{value},\textit{down},\textit{right})\).
We implement multicore saturation as in Algorithm 1. We have a transition relation disjunctively partitioned into M relations \(R_0\dots R_{M1}\). These relations are sorted by the level (depth) of the decision diagram where they are applied, which is the first level touched by the relation. We say that relation \(R_i\) is applied at depth \(d_i\). We identify the current next relation with a number k, \(0\le k\le M\), where \(k=M\) denotes “no next relation”. Decision diagram levels are sequentially numbered with 0 for the root level.
The saturate algorithm is given the initial set of states S and the initial next transition relation \(k=0\) and the initial decision diagram level \(d=0\). The algorithm is a straightforward implementation of saturation. First we check the easy cases where we reach either the end of an LDD list, where \(S=0\), or the bottom of the decision diagram, where \(S=1\). If there are no more transition relations to apply, then \(k=M\) and we can simply return S. When we arrive at line 4, the operation is not trivial and we consult the operation cache.
If the result of this operation was not already in the cache, then we check whether we have relations at the current level. Since the relations are sorted by the level where they must be applied, we compare the current level d with the level \(d_k\) of the next relation k. If we have relations at the current level, then we perform the fixed point computation where we first saturate S for the remaining relations, starting at relation \(k'\), which is the first relation that must be applied on a deeper level than d, and then apply the relations of the current level, that is, all \(R_i\) where \(k\le i<k'\). If no relations match the current level, then we compute in parallel the results of the suboperations for the LDD of successor “right” and for the LDD of successor “down”. After obtaining these sub results, we use LookupLDDNode to compute the final result for this LDD node. Finally, we store this result in the operation cache and return it.
The implementation of multicore saturation for BDDs is identical, except that we parallelize on the “then” and “else” successors of a BDD node, instead of on the “down” and “right” successors of an LDD node.
The learntransitions function provided by LTSmin updates relation i given a set of states S. The function first restricts S to socalled short states \(S^i\), which is the projection of S on the state variables that are touched by relation i. Then it calls the nextstate function of the PINS interface for each new short state and it updates \(R_i\) with the new transitions.
Updating transition relations from multiple threads is not completely trivial. LTSmin solves this using lockfree programming with the compareandswap operation. After collecting all new transitions, LTSmin computes the union with the known transitions and uses compareandswap to update the global relation; if this fails, the union is repeated with the new known transitions.
4 Contributed Tools
We present several new tools and extensions to existing tools produced in this work. The new tools support experiments and comparisons between various DD formats. The extension to Sylvan and LTSmin provides endusers with multicore saturation for reachability analysis.
4.1 Tools for Experimental Purposes
For the empirical evaluation, we need to isolate the reachability analysis of a given LDD (or BDD or MDD). To that end, we implemented three small tools that only compute the set of reachable states, namely Open image in new window for LDDs, Open image in new window for BDDs and Open image in new window for MDDs using the library Meddly. These tools are given an input file representing the model, compute the set of reachable states, and report the number of states and the required time to compute all reachable states. Additionally we provide the tools Open image in new window and Open image in new window that convert an LDD file to a BDD file and to an MDD file. The LDD input files are generated using LTSmin (see below). These tools can all be found online^{1}.
4.2 Tools for OnTheFly Multicore Saturation
Onthefly multicore saturation is implemented in the LTSmin toolset, which can be found online^{2}. The examples in this section are also online^{3}. Onthefly multicore saturation for Petri nets is available in LTSmin’s tool Open image in new window . This tool computes all reachable markings with parallel saturation. The command line to run it on Fig. 1 is Open image in new window . The tool reports: Open image in new window . Additionally, it appears the final LDD has 16 nodes.
Here the syntactic variable order of the places in Open image in new window is used. To use a better variable order, the option Open image in new window is added to the command line. For instance adding Open image in new window runs Force, while Open image in new window runs Sloan’s algorithm (as implemented in the wellknown Boost library). Running Open image in new window reports that the final LDD has only 12 nodes.
The naming convention of LTSmin’s binaries follows the Partitioned NextState Interface (PINS) architecture [5, 22, 25]. PINS forms a bridge between several language frontends and algorithmic backends. Consequently, besides Open image in new window , LTSmin also provides Open image in new window and several other combinations. These binaries generate the state space for the languages PNML, DVE and Promela, by means of distributed explicitstate, multicore explicitstate and multicore symbolic algorithms, respectively. Additionally, LTSmin supports checking for deadlocks and invariants, and verifying LTL properties and \(\mu \)calculus formulas. In this work we focus on state space generation with the symbolic backend only.
We now demonstrate multicore saturation for Promela models. Consider the file Open image in new window which is an implementation of the GARP protocol [23]. To compute the reachable state space with the proposed algorithm and Force order, run: Open image in new window . On a consumer laptop with 8 hardware threads, LTSmin reports 385,000,995,634 reachable states within 1 min. To run the example with a single worker, run Open image in new window . On the same laptop, the algorithm runs in 4 min with 1 worker. We thus have a speedup of 4\(\times \) with 8 workers for symbolic saturation on a Promela model.
5 Empirical Evaluation
Our goal with the empirical study is fivefold. First, we compare our parallel implementation with only 1 core to the purely sequential implementation of the MDD library Meddly [4], in order to determine whether our implementation is competitive with the stateoftheart. Second, we study parallel scalability up to 16 cores for all models and up to 48 cores with a small selection of models. Third, we compare parallel saturation with LDDs to parallel saturation with ordinary BDDs, to see if we get similar results with BDDs. Fourth, we compare parallel saturation without onthefly transition learning to onthefly parallel saturation, to see the effects of onthefly transition learning on the performance of the algorithm. Fifth, we compare parallel saturation with other reachability strategies, namely chaining and BFS, to confirm whether saturation is indeed a better strategy than chaining and BFS.
The six solving methods that we use in the empirical evaluation. Five methods are parallelized and one method is onthefly.
Method  Tool  Description  Input  Parallel  OTF 

otflddsat  pnml2ltssym  saturation  PNML  
lddsat  lddmc  saturation  LDD  
lddchaining  lddmc  chaining  LDD  
lddbfs  lddmc  BFS  LDD  
bddsat  bddmc  saturation  BDD  
mddsat  medmc  saturation in Meddly  MDD 
Number of benchmarks (out of 413) solved within 20 min with each method with the given number of workers.
Method  Number of solved models with # workers  

1  2  4  8  16  Any  
otflddsat  387  397  399  404  407  408 
lddsat  388  393  399  402  402  404 
lddchaining  351  354  360  367  371  371 
lddbfs  325  331  347  360  362  362 
bddsat  395  396  401  402  403  405 
mddsat  375  –  –  –  –  375 
See Table 1 for the list of solving methods. As described in Sect. 4, we implement the tools lddmc, bddmc and medmc to isolate reachability computation for the purposes of this comparison, using respectively the LDDs and BDDs of Sylvan and the MDDs of Meddly. The onthefly parallel saturation using LDDs is performed with the pnml2ltssym tool of LTSmin. We use the command line pnml2ltssym ORDER laceworkers=WORKERS saturation=sat FILE, where ORDER is rf for Force and rbs for Sloan and WORKERS is a number from the set \(\{1,2,4,8,16\}\).
All experimental scripts, input files and log files are available online (see footnote 3). The experiments are performed on a cluster of Dell PowerEdge M610 servers with two Xeon E5520 processors and 24 GB internal memory each. The tools are compiled with gcc 5.4.0 on Ubuntu 16.04. The experiments for up to 48 cores are performed on a single computer with 4 AMD Opteron 6168 processors with 12 cores each and 128 GB internal memory.
When reporting on parallel executions, we use the number of workers for how many hardware threads (cores) were used.
Cumulative time and parallel speedups for each method#workers combination on the models where all methods solved the model in time. These are 301 models in total: 151 models with Force, 150 models with Sloan.
Method  Order  Total time (sec) with # workers  Total speedup  

1  2  4  8  16  2  4  8  16  
otflddsat  Sloan  1850  1546  698  398  313  1.2  2.7  4.6  5.9 
lddsat  Sloan  932  609  311  194  151  1.5  3.0  4.8  6.2 
lddchaining  Sloan  4156  3019  1916  1121  863  1.4  2.2  3.7  4.8 
lddbfs  Sloan  9030  5585  2990  1652  1219  1.6  3.0  5.5  7.4 
bddsat  Sloan  708  419  212  139  115  1.7  3.3  5.1  6.1 
mddsat  Sloan  572  –  –  –  –  –  –  –  – 
otflddsat  Force  2704  1162  712  401  343  2.3  3.8  6.8  7.9 
lddsat  Force  856  602  348  216  180  1.4  2.5  4.0  4.7 
lddchaining  Force  3149  2560  1835  1160  1024  1.2  1.7  2.7  3.1 
lddbfs  Force  4696  2951  1556  859  633  1.6  3.0  5.5  7.4 
bddsat  Force  1041  733  384  253  206  1.4  2.7  4.1  5.1 
mddsat  Force  1738  –  –  –  –  –  –  –  – 
To correctly compare all runtimes, we restrict the set of models to those where all methods finish within 20 min with any number of workers. We retain in total 301 models where no solver hit the timeout. See Table 3 for the cumulative times for each method and number of workers and the parallel speedup. Notice that this is the speedup for the entire set of 301 models and not for individual models.
Comparing LDD saturation with Meddly’s saturation. We evaluate how lddsat with just 1 worker compares to the sequential saturation of Meddly. The goal is not to directly measure whether there is a parallel overhead from using parallelism in Sylvan, as the algorithm in lddmc is fundamentally different because it uses LDDs instead of MDDs and the algorithm does not inplace saturate nodes, as also explained in Sect. 3. The low parallel overheads of Sylvan are already demonstrated elsewhere [15, 16, 18]. Rather, the goal is to see how our version of saturation compares to the stateoftheart.
Table 2 shows that Meddly’s implementation (mddsat) and our implementation (lddsat 1) are quite similar in the number of solved models. Meddly solves 375 benchmarks and our implementation solves 388 within 20 min.
Parallel speedup for a selection of benchmarks on the 48core machine (only top 5 shown)
Model (with lddsat)  Order  Time (sec)  Speedup  

1  24  48  24  48  
DekkerPT015  Sloan  77.3  4.7  2.4  16.3  32.5 
PhilosophersDynPT10  Force  273.8  16.8  12.4  16.3  22.1 
AngiogenesisPT10  Sloan  333.2  28.5  16.5  11.7  20.2 
SwimmingPoolPT02  Force  25.0  2.1  1.4  11.6  17.8 
BridgeAndVehiclesPTV20P10N20  Force  1035.8  101.8  60.7  10.2  17.1 
Model (with otflddsat)  
DekkerPT015  Sloan  174.5  7.4  3.3  23.6  52.2 
SwimmingPoolPT07  Sloan  1008.0  69.2  42.0  14.6  24.0 
SmallOperatingSystemPTMT0256DC0064  Sloan  957.3  52.9  40.0  18.1  23.9 
KanbanPT0050  Sloan  940.6  78.7  48.9  11.9  19.2 
TCPcondisPT10  Force  68.4  5.7  3.8  11.9  17.8 
Parallel Scalability. As shown in Table 3, using 16 workers, we obtain a modest parallel speedup for saturation of 6.2\(\times \) (with Sloan) and 4.7\(\times \) (with Force). On individual models, the differences are large. The average speedup of the individual benchmarks is only 1.8\(\times \) with 16 workers, but there are many slowdowns for models that take less than a second with 1 worker. We take an arbitrary selection of models with a high parallel speedup and run these on a dedicated 48core machine. Table 4 shows that even up to 48 cores, parallel speedup keeps improving. We even see a speedup of 52.2\(\times \). For this superlinear speedup we have two possible explanations. One is that there is some nondeterminism inherent in any parallel computation; another is already noted in [20] and is related to the “chaining” in saturation, see further [20].
Comparing LDD saturation with BDD saturation. As Table 3 shows, the lddsat and bddsat method have a similar performance and similar parallel speedups.
Onthefly LDD saturation. Comparing the performance of offline saturation with onthefly saturation, we observe the same scalability with the Sloan variable order, but onthefly saturation requires roughly 2\(\times \) as much time. With the Force variable order, onthefly saturation is slower but has a higher parallel speedup of 7.9\(\times \).
6 Conclusion
We presented a multicore implementation of saturation for the efficient computation of the set of reachable states. Based on Sylvan’s multicore decision diagram framework, the design of the saturation algorithm is mostly orthogonal to the type of decision diagram. We showed the implementation for BDDs and LDDs; the translation relation can be learned onthefly. The functionality is accessible through the LTSmin highperformance model checker. This makes parallel saturation available for a whole collection of asynchronous specification languages. We demonstrated multicore saturation for Promela and for Petri nets in PNML representation.
We carried out extensive experiments on a benchmark of Petri nets from the Model Checking Contest. The total speedup of onthefly saturation is 5.9\(\times \) on 16 cores with the Sloan variable ordering and 7.9\(\times \) with the Force variable ordering. However, there are many small models (computed in less than a second) in this benchmark. For some larger models we showed an impressive 52\(\times \) speedup on a 48core machine. From our measurements, we further conclude that the efficiency and parallel speedup for the BDD variant is just as good as the speedup for LDDs. We compared efficiency and speedup of saturation versus other popular exploration strategies, BFS and chaining. As expected, saturation is significantly faster than chaining, which is faster than BFS; this trend is maintained in the parallel setting. Our measurements show that the variable ordering (Sloan versus Force), and the model representation (precomputed transition relations versus learned onthefly) do have an impact on efficiency and speedup. Parallel speedup should not come at the price of reduced efficiency. To this end, we compared our parallel saturation algorithm for one worker to saturation in Meddly. Meddly solves fewer models within the timeout, but is slightly faster in other cases, but parallel saturation quickly overtakes Meddly with multiple workers.
Future work could include the study of parallel saturation on exciting new BDD types, like tagged BDDs and chained BDDs [8, 19]. The results on tagged BDDs showed a significant speedup compared to ordinary BDDs on experiments in LTSmin with the BEEM benchmark database. Another direction would be to investigate the efficiency and speedup of parallel saturation in other applications, like CTL model checking, SCC decomposition, and bisimulation reduction.
Footnotes
References
 1.Aloul, F.A., Markov, I.L., Sakallah, K.A.: FORCE: a fast and easytoimplement variableordering heuristic. In: VLSI 2003, pp. 116–119. ACM (2003)Google Scholar
 2.Amparore, E.G., Beccuti, M., Donatelli, S.: Gradientbased variable ordering of decision diagrams for systems with structural units. In: D’Souza, D., Narayan Kumar, K. (eds.) ATVA 2017. LNCS, vol. 10482, pp. 184–200. Springer, Cham (2017). https://doi.org/10.1007/9783319681672_13CrossRefGoogle Scholar
 3.Amparore, E.G., Donatelli, S., Beccuti, M., Garbi, G., Miner, A.S.: Decision diagrams for Petri nets: which variable ordering? In: PNSE @ Petri Nets. CEUR Workshop Proceedings, vol. 1846, pp. 31–50. CEURWS.org (2017)Google Scholar
 4.Babar, J., Miner, A.S.: Meddly: multiterminal and edgevalued decision diagram library. In: QEST, pp. 195–196. IEEE Computer Society (2010)Google Scholar
 5.Blom, S., van de Pol, J., Weber, M.: LTSmin: distributed and symbolic reachability. In: Touili, T., Cook, B., Jackson, P. (eds.) CAV 2010. LNCS, vol. 6174, pp. 354–359. Springer, Heidelberg (2010). https://doi.org/10.1007/9783642142956_31CrossRefGoogle Scholar
 6.Bollig, B., Wegener, I.: Improving the variable ordering of OBDDs is NPcomplete. IEEE Trans. Comput. 45(9), 993–1002 (1996)CrossRefGoogle Scholar
 7.Bryant, R.E.: Graphbased algorithms for Boolean function manipulation. IEEE Trans. Comput. C–35(8), 677–691 (1986)CrossRefGoogle Scholar
 8.Bryant, R.E.: Chain reduction for binary and zerosuppressed decision diagrams. In: Beyer, D., Huisman, M. (eds.) TACAS 2018. LNCS, vol. 10805, pp. 81–98. Springer, Cham (2018). https://doi.org/10.1007/9783319899602_5CrossRefGoogle Scholar
 9.Chung, M., Ciardo, G.: Saturation NOW. In: QEST, pp. 272–281. IEEE Computer Society (2004)Google Scholar
 10.Chung, M., Ciardo, G.: Speculative image computation for distributed symbolic reachability analysis. J. Logic Comput. 21(1), 63–83 (2011)MathSciNetCrossRefGoogle Scholar
 11.Ciardo, G., Lüttgen, G., Siminiceanu, R.: Saturation: an efficient iteration strategy for symbolic state—space generation. In: Margaria, T., Yi, W. (eds.) TACAS 2001. LNCS, vol. 2031, pp. 328–342. Springer, Heidelberg (2001). https://doi.org/10.1007/3540453199_23CrossRefzbMATHGoogle Scholar
 12.Ciardo, G., Marmorstein, R.M., Siminiceanu, R.: Saturation unbound. In: Garavel, H., Hatcliff, J. (eds.) TACAS 2003. LNCS, vol. 2619, pp. 379–393. Springer, Heidelberg (2003). https://doi.org/10.1007/354036577X_27CrossRefGoogle Scholar
 13.Ciardo, G., Marmorstein, R.M., Siminiceanu, R.: The saturation algorithm for symbolic statespace exploration. STTT 8(1), 4–25 (2006)CrossRefGoogle Scholar
 14.Ciardo, G., Zhao, Y., Jin, X.: Parallel symbolic statespace exploration is difficult, but what is the alternative? In: PDMC, EPTCS, vol. 14, pp. 1–17 (2009)Google Scholar
 15.van Dijk, T.: Sylvan: multicore decision diagrams. Ph.D. thesis, University of Twente, July 2016Google Scholar
 16.van Dijk, T., van de Pol, J.: Sylvan: multicore framework for decision diagrams. STTT 19(6), 675–696 (2017)CrossRefGoogle Scholar
 17.van Dijk, T., van de Pol, J.: Multicore symbolic bisimulation minimisation. STTT 20(2), 157–177 (2018)CrossRefGoogle Scholar
 18.van Dijk, T., van de Pol, J.C.: Lace: nonblocking split deque for workstealing. In: Lopes, L., et al. (eds.) EuroPar 2014. LNCS, vol. 8806, pp. 206–217. Springer, Cham (2014). https://doi.org/10.1007/9783319143132_18CrossRefGoogle Scholar
 19.van Dijk, T., Wille, R., Meolic, R.: Tagged BDDs: combining reduction rules from different decision diagram types. In: FMCAD, pp. 108–115. IEEE (2017)Google Scholar
 20.Ezekiel, J., Lüttgen, G., Ciardo, G.: Parallelising symbolic statespace generators. In: Damm, W., Hermanns, H. (eds.) CAV 2007. LNCS, vol. 4590, pp. 268–280. Springer, Heidelberg (2007). https://doi.org/10.1007/9783540733683_31CrossRefGoogle Scholar
 21.Heyman, T., Geist, D., Grumberg, O., Schuster, A.: A scalable parallel algorithm for reachability analysis of very large circuits. Formal Methods Syst. Des. 21(3), 317–338 (2002)CrossRefGoogle Scholar
 22.Kant, G., Laarman, A., Meijer, J., van de Pol, J., Blom, S., van Dijk, T.: LTSmin: highperformance languageindependent model checking. In: Baier, C., Tinelli, C. (eds.) TACAS 2015. LNCS, vol. 9035, pp. 692–707. Springer, Heidelberg (2015). https://doi.org/10.1007/9783662466810_61CrossRefGoogle Scholar
 23.Konnov, I., Letichevsky, O.: Model checking GARP protocol using Spin and VRS. In: IW on Automata, Algorithms, and Information Technology (2010)Google Scholar
 24.Kordon, F., et al.: Complete Results for the 2016 Edition of the Model Checking Contest, June 2016. http://mcc.lip6.fr/2016/results.php
 25.Meijer, J., Kant, G., Blom, S., van de Pol, J.: Read, write and copy dependencies for symbolic model checking. In: Yahav, E. (ed.) HVC 2014. LNCS, vol. 8855, pp. 204–219. Springer, Cham (2014). https://doi.org/10.1007/9783319133386_16CrossRefGoogle Scholar
 26.Meijer, J., van de Pol, J.: Bandwidth and wavefront reduction for static variable ordering in symbolic reachability analysis. In: Rayadurgam, S., Tkachuk, O. (eds.) NFM 2016. LNCS, vol. 9690, pp. 255–271. Springer, Cham (2016). https://doi.org/10.1007/9783319406480_20CrossRefGoogle Scholar
 27.Noack, A.: A ZBDD package for efficient model checking of Petri nets. Forschungsbericht, Branderburgische Technische Uinversität Cottbus (1999)Google Scholar
 28.Oortwijn, W., van Dijk, T., van de Pol, J.: Distributed binary decision diagrams for symbolic reachability. In: 24th ACM SIGSOFT International SPIN Symposium on Model Checking of Software, pp. 21–30 (2017)Google Scholar
 29.Siminiceanu, R., Ciardo, G.: New metrics for static variable ordering in decision diagrams. In: Hermanns, H., Palsberg, J. (eds.) TACAS 2006. LNCS, vol. 3920, pp. 90–104. Springer, Heidelberg (2006). https://doi.org/10.1007/11691372_6CrossRefzbMATHGoogle Scholar
 30.Sloan, S.W.: A FORTRAN program for profile and wavefront reduction. Int. J. Numer. Methods Eng. 28(11), 2651–2679 (1989)CrossRefGoogle Scholar
 31.Vörös, A., Szabó, T., Jámbor, A., Darvas, D., Horváth, Á., Bartha, T.: Parallel saturation based model checking. In: ISPDC, pp. 94–101. IEEE Computer Society (2011)Google Scholar
 32.Zhao, Y., Ciardo, G.: Symbolic CTL model checking of asynchronous systems using constrained saturation. In: Liu, Z., Ravn, A.P. (eds.) ATVA 2009. LNCS, vol. 5799, pp. 368–381. Springer, Heidelberg (2009). https://doi.org/10.1007/9783642047619_27CrossRefGoogle Scholar
 33.Zhao, Y., Ciardo, G.: Symbolic computation of strongly connected components and fair cycles using saturation. ISSE 7(2), 141–150 (2011)Google Scholar
 34.van Dijk, T., van de Pol, J., Meijer, J.: Artifact and instructions to generate experimental results for TACAS 2019 paper: Multicore OnTheFly Saturation (artifact). Figshare (2019). https://doi.org/10.6084/m9.figshare.7825406.v1
Copyright information
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.