## 1 Introduction

Model checking has proven effective for automatically verifying correctness of protocols, controllers, schedulers and other systems. Because a model checker tool relies on the exhaustive exploration of the system’s state space, its power depends on efficient storage of states.

To illustrate the structure of typical states in model checking problems, consider Lamport’s Bakery algorithm in Fig. 1; a mutual exclusion protocol that mimics a bakery with numbering machine to prioritize customers. Due to limitation of computing hardware, the number is not maintained globally but reconstructed from local counters in N[i] (for each process i). For two processes, the state vector of this program consists of the two program counters ($$\text {pc}$$ s) and all variables, i.e., $$\left\langle \texttt {E} , N, \text {pc} _0, \texttt {E} , N, \text {pc} _1 \right\rangle$$.Footnote 1 Their respective domains are:

\begin{aligned} \left\langle \left\{ \top , \bot \right\} , [0\ldots 2], [0\ldots 7],\left\{ \top , \bot \right\} , [0\ldots 2], [0\ldots 7] \right\rangle . \end{aligned}

There are $$2 \times 3 \times 8 \times 2 \times 3 \times 8 = 2304$$ possible state vectors. The task of the model checker is determine which of those are reachable from the initial state; here $$\iota \,\triangleq \,\left\langle \bot , 0, 0, \bot , 0, 0 \right\rangle$$. It does this using a next-state function, which in this case implements the semantics of the Bakery algorithm to compute the successor states of any state. The next-state function furthermore accounts for all parallel interleavings, by implementing a worst-case scheduler. For example, the successors of the initial state are:

\begin{aligned}&\textsf { \textsc {next-state}}(\left\langle \bot , 0, 0, \bot , 0, 0 \right\rangle )\\&\quad = \{ \left\langle \top , 0, 1, \bot , 0, 0 \right\rangle , \left\langle \bot , 0, 0, \top , 0, 1 \right\rangle \} \end{aligned}

One successor represents the case where the first process executed Line 0; its program counter is set to 1, and $$\texttt {E} $$ is updated as a consequence. Similarly, the other successor represents the case where the second process executed Line 0.

While exhaustively exploring all reachable states, the model checker searches whether it can reach a state from the set $$\text {Error}$$. For the Bakery algorithm with two threads, errors are violations of the mutual exclusion principle:

\begin{aligned} \text {Error} \,\triangleq \,\{ \left\langle b_0, n_0, 7, b_1, n_1, 7 \right\rangle \mid b_0,b_1 \in \left\{ \top , \bot \right\} ,\\ n_0,n_1\in [0\ldots 2] \} \end{aligned}

So, $$\text {Error}$$ is a collection of all states where both processes reside in their critical section, i.e., where both their program counter locations equal 7.

For completeness, Algorithm 1 shows the basic reachability procedure. The more states the reachability procedure can process, the more powerful the model checker is, i.e., the larger program instances it can automatically verify. This number depends crucially on the size of the visited states set V in memory. Several techniques exist to reduce V: partial-order reduction [24, 38], symmetry reduction [15, 43], BDDs [3, 9], etc. Here, we focus on explicitly storing the states in V using state compression.

The potency of compression becomes apparent from two related observations:

• Locality: Successors computed in the next-state function exhibit locality, e.g.,

\begin{aligned}&\textsf { \textsc {next-state}}(\left\langle \bot , 1, 4, \bot , 2, 6 \right\rangle ) \\&\quad = \{ \left\langle \bot , 1, \mathbf 5 , \bot , 2, 6 \right\rangle , \left\langle \bot , 1, 4, \bot , 2, \mathbf 7 \right\rangle \} \end{aligned}

Note that only program counters change value (marked bold in successors).

• Combinatorics: Similar to the set of all possible state vectors, the set of reached state vectors is also highly combinatorial. Assuming $$\left\langle \bot , 1, 4, \bot , 2, 6 \right\rangle$$ can be reached from the initial state $$\iota$$, we indeed saw four different vectors sharing large sub-vectors with their predecessors (underlined here):

$$\left\langle \bot , 0, 0, \bot , 0, 0 \right\rangle$$$$\longrightarrow$$$$\left\langle \top , \underline{0}, 1, \underline{ \bot , 0, 0} \right\rangle$$, $$\left\langle \underline{ \bot , 0, 0}, { \top }, \underline{0}, 1 \right\rangle$$

$$\left\langle \bot , 1, 4, \bot , 2, 6 \right\rangle$$$$\longrightarrow$$$$\left\langle \underline{ \bot , 1}, 5, \underline{ \bot , 2, 6} \right\rangle$$, $$\left\langle \underline{ \bot , 1, 4, \bot , 2}, 7 \right\rangle$$

We hypothesize that the typical locality of the next-state function ensures that the set of reachable states exhibits this combinatorial structure in the limit. Therefore, storing each vector in its entirety, e.g., in a hash table, would duplicate a lot of data. By folding the reachable state vectors in a tree, however, these shared sub-vectors only have to be stored once (more in Sect. 3).

In this article, we investigate the lower bound on the space requirements of typical state spaces occurring in model checking consisting of n vectors with ku-bit slots each. We do this by modeling the state spaces as an information stream. The values in this stream probabilistically depend on previously seen values, in effect modeling the locality in the next-state function. A simple application of Shannon’s information theory yields a lower bound of approximately $$u + \log _2(k) + \epsilon$$ bits for the storage requirements of our “state space stream” (see Theorem 1), far below the uncompressed space requirements of $$k\cdot u$$ bits.

Subsequently, in Sects. 3 and 4, we investigate whether this lower bound can be reached in practice. To this end, we provide an implementation for the visited set V. A practical compressed data structure has an additional requirement that the query time, the time it takes to look up and insert individual state vectors, is constant with respect to the number of states stored. The storage technique suggested by the information theoretical model, i.e., maintaining differences between successor states, does not satisfy this requirement as it would require the iteration of all inserted vectors in the worst case . Therefore, we utilize a binary tree in combination with a compact hash table .

In our Compact Tree, the compact hash table has $$2^{w+o}$$ buckets storing tuples of w-bit sized pointers from the binary tree. It does so compactly by using the storage location in memory as information . Through this technique, it can chop off $$w + o - 3$$ bits from the 2w bits of each tuple. (The three bits are required for bookkeeping.) Under reasonable assumption on the state length k, number of states n, pointer size w and state slot size u, we analytically show that the Compact Tree can even surpass the information theoretical lower bound when $${w-o+6} \le u$$ (see Theorem 5).Footnote 2

According to the same best-case analysis, our specific implementation of the Compact Tree can compress arbitrarily large state descriptors down to only one 32-bit integer per state (our benchmarks contain inputs with state vectors of a thousand bytes long). Extensive experimentation in Sect. 5 with diverse input models in five different input languages shows moreover that this compression is also reached in practice, and with little computational overhead due to incremental insertion in the tree (see Sect. 3.3).

Surprising is perhaps that the Compact Tree can compress states down to x bits while storing far more than $$2^{x}$$ states. This property of representing a set with less space per element than required for its unique identification is inherited from the compact hash table. As mentioned above, this is realized by using the place in memory as information. Indeed, the case study in Sect. 5.9 demonstrates that the Compact Tree implementation can store $$2^{35.6}$$ states using only 32.5 bits per state.

• New experiments using the ProB models [2, 28], adding more evidence that compact tree compression works regardless of the input language.

• An extensive comparison of compact tree compression with both Binary Decision Diagram and Multi-valued Decision Diagrams in Sect. 5.5.

• A better lower bound in Theorem 5 and additional derivations and proofs for the lower bounds in Theorem 1 and Theorem 5.

• A case study of the GARP specification from  in Sect. 5.9 demonstrating that the Compact Tree implementation performs as predicted.

## 2 An information theoretical lower bound

The fact that state spaces have combinatorial values is related to the fact that states generated by a model checker exhibit locality as we discussed in Sect. 1. We will make no assumptions on the nature of the inputs, besides the locality of state generation. In the current section, we will derive the information entropy—which is equal to the minimum number of bits needed for its storage—of a single state vector using basic notions from information theory.

Information theory abstracts away from the computational nature of a program by considering sender and receiver as black boxes that communicate data (signals) via a channel. The goal for the sender is to encode the data as small as possible, such that the receiver is still able to decode it back to the original. The encoded size depends on the amount of entropy in the data. In the most basic case, no statistical information is known about the data: Each of the X possible messages has an equal probability of taking one of its values, and the entropy H is maximal: $$H(X) = \log _2(|X|)$$ bit, i.e., the entropy directly corresponds to using one fixed-sized ($$\log _2(|X|)$$) bit pattern for each possible message.

If more is known about the statistical nature of the information coming from the sender, the entropy is lower as more elaborate encodings can be used to reduce the number of bits needed per piece of information. A simple example is when we take into account the character frequency of the English language for encoding sentences. Assuming that certain characters are much more frequent, a code of fewer bits can be used for them, while longer codes can be reserved for infrequent characters. To calculate the entropy in this example, we need the probability of occurrence p(x) for each character $$x\in X$$ in the English language. We can deduce this from analyzing a dictionary, or better a large corpus of texts. The entropy then becomes: $$H(X) = \sum _{x \in X}-p(x)\log _2(p(x))$$

We apply the same principle now to state vectors. As data source, we use the next-state function to compute new states, as we saw in Sect. 1:

\begin{aligned} \textsf { \textsc {next-state}}(\left\langle \bot , 1, 4, \bot , 2, 6 \right\rangle ) = \{ \left\langle \bot , 1, \mathbf 5 , \bot , 2, 6 \right\rangle , \ldots \} \end{aligned}

As a simplification, let states consist of k variables. By storing full states in the queue Q, the predecessor state is always known in the model checker’s reachability procedure (see s and $$s'$$ on line 6 in Algorithm 1). Hence, we can abstract away from the one-to-many relation of the next-state function and instead view the arriving states as a k-periodic stream of variable assignments:

\begin{aligned} \left\langle v^0_0,\ldots v^0_{k-1} \right\rangle ,\left\langle v^1_0,\ldots v^1_{k-1} \right\rangle ,\dots , \left\langle v^{n-1}_0,\dots v^{n-1}_{k-1} \right\rangle \end{aligned}

It thus makes sense to describe the probability that a variable holds a certain value with respect to the same variable in the predecessor state: For each variable $$v^{i}_{j}$$ with $$i \ge 0$$ and $$0\le j<k-1$$, both encoder and decoder can always look at the corresponding variable $$v^{i-1}_{j}$$ in the predecessor to retrieve its previous value.

Since we are interested in establishing a lower bound, we may safely under-approximate the number of variables changing value with respect to a state’s predecessor. It makes sense to assume that only one variable changes value, since with zero changes, the same state is generated (requiring no “new” space in V). Hence, we take the following relative probabilities (see example Fig. 2):

\begin{aligned} p(v^i_j \ne v^{i-1}_j) = \frac{1}{k} \quad \quad p(v^i_j = v^{i-1}_j) = \frac{k-1}{k} \end{aligned}

Let $$\left\langle d^0_0,\dots d^0_{k-1} \right\rangle ,\left\langle d^1_0,\dots d^1_{k-1} \right\rangle ,\dots , \left\langle d^{n-1}_0,\dots d^{n-1}_{k-1} \right\rangle$$, be the domains of the state slots. As a simplification, we assume that all domains have u bits, resulting in $$y=2^{u}$$ values. Therefore, there are $$y-1$$ possible values, for which variable $$v^{i}_j$$ can differ from its predecessor $$v^{i-1}_j$$. Therefore, the probability for one of these other values $$x\in d^i_j$$ becomes $$p(x) = \frac{1}{k} \times \frac{1}{y-1}= \frac{1}{k(y-1)}$$ (this equal probability distribution over the possible values results in higher entropy, but recall that we do not make other assumptions on the nature of the inputs). Of course, there is only one value assignment when the variable $$v^{i}_j$$ does not change, i.e., the valuation of the same variable in the predecessor state $$v^{i-1}_j$$.Footnote 3 This results in the following definition of entropy per variable in the stream:

\begin{aligned} H_{\mathrm{var}}(v^i_j)= & {} -\frac{k-1}{k}\log _2\left( \frac{k-1}{k}\right) \\&+ \sum ^{y-1}_{n=1}-\frac{1}{k(y-1)} \log _2\left( \frac{1}{k(y-1)}\right) \end{aligned}

After some simplification, we can derive the state vector’s entropy: (1)

### Theorem 1

(Information Entropy of States Exhibiting Locality) For $$k>1$$, the information entropy of state vectors in state spaces exhibiting locality, abbreviated with $$H_{\text {state}}$$, is bounded by:

\begin{aligned}&\log _2(y-1) + \log _2(k-1) + 1 \le H_\mathrm{state} \\&\quad \le \log _2(y) + \log _2(k) + 2 = u+ \log _2(k) + 2 \end{aligned}

### Proof

We first show that $$H_{\text {state}} \le \log _2(y) + \log _2(k) + 2 = u+ \log _2(k) + 2$$.

\begin{aligned}&\log _2(y{-1}) + \log _2(k{-1}) + k \log _2(\frac{k}{k-1}) \\&\quad {\mathop {\le }\limits ^{?}}\log _2(y) + \log _2(k) + 2\\&\log _2(y) + \log _2(k) + k \log _2(\frac{k}{k-1}) \\&\quad {\mathop {\le }\limits ^{?}}\log _2(y) + \log _2(k) + 2 \quad {[\hbox {increase left}]} \\&k \log _2(\frac{k}{k-1}) {\mathop {\le }\limits ^{?}}2\\&\log _2(\frac{k}{k-1}) {\mathop {\le }\limits ^{?}}\nicefrac {2}{k}\\&\frac{k}{k-1} {\mathop {\le }\limits ^{?}}\root k \of {4}\\&1 + \frac{1}{k-1} {\mathop {\le }\limits ^{?}}\root k \of {4}\\&(1 + \frac{1}{k-1})^k {\mathop {\le }\limits ^{?}}4 \end{aligned}

For $$k=2$$ (recall that $$k > 1$$), we have $$(1 + \frac{1}{k-1})^k = 4$$. In the range $$[2,\infty )$$, this function monotonically decreases toward the limit $$\lim _{k \rightarrow \infty }(1 + \frac{1}{k-1})^k = \lim _{k \rightarrow \infty }(1 + \frac{1}{k})^k = e$$. Hence, it holds that $$(1 + \frac{1}{k-1})^k \le 4$$ and $$H_{\text {state}} \le \log _2(y) + \log _2(k) + 2 = u+ \log _2(k) + 2$$ for $$k> 1$$.

Now, we show that $$\log _2(y-1) + \log _2(k-1) + 1 \le H_{\text {state}}$$.

\begin{aligned}&\log _2(y-1) + \log _2(k-1) + 1 {\mathop {\le }\limits ^{?}}\log _2(y{-1}) \\&\quad + \log _2(k{-1}) + k \log _2(\frac{k}{k-1})\\&1 {\mathop {\le }\limits ^{?}}k \log _2(\frac{k}{k-1})\\&\nicefrac 1k {\mathop {\le }\limits ^{?}}\log _2(\frac{k}{k-1}) \\&\root k \of {2} {\mathop {\le }\limits ^{?}}\frac{k}{k-1} \\&\root k \of {2} {\mathop {\le }\limits ^{?}}1 + \frac{1}{k-1} \\&2 {\mathop {\le }\limits ^{?}}(1 + \frac{1}{k-1})^k \end{aligned}

Again for $$k=2$$, we have $$(1 + \frac{1}{k-1})^k = 4$$ and $$\lim _{k \rightarrow \infty }(1 + \frac{1}{k-1})^k = e$$ (monotonically decreasing), hence $$\log _2(y-1) + \log _2(k-1) + 1 \le H_{\text {state}}$$. $$\square$$

The entropy of states $$H_{\text {state}}$$ provides a lower bound on the number of bits required to encode states. Consequently, the upper bound on the entropy in Theorem 1 gives a conservative estimation of the lower bound. Intuitively, the upper bound from Theorem 1 makes sense since a single modification in each new state vector can be encoded with solely the index of the changed variable, in $$\log (k)$$ bits, plus its new value, in $$\log (y) = u$$ bits, plus some overhead to accommodate cases where more than one variable changes value. This result indicates that locality could allow us to store sets of arbitrarily long ($$k\cdot u$$-bit) state vectors using a small integer of less than $$u +\log _2(k) + 2$$ bits per vector.

In practice, this could mean that vectors of a thousand (1024) byte-size variables can be compressed to 20 bits each, which is only slightly more than if these states were numbered consecutively—in which case the states would be 18 bits—but far less than 8192 bits required for storing the full state vectors.

## 3 An analysis of binary tree compression

The interpretation of the results in Sect. 2 suggests a trivial data structure to reach the information theoretical lower bound: Simply store incremental differences between state vectors. However, as noted in the introduction, an incremental data structure like that does not provide the required efficiency for lookup operations. (The reachability procedure in Algorithm 1 needs to determine whether states have been visited before on Line 7.)

The current section shows how many state vectors can be folded into a single binary tree of hash tables to achieve sharing among sub-vectors while also achieving poly-logarithmic lookup times in the worst case. This is the first step toward achieving the optimal compression from Sect. 2 in practice. Section 4 presents the second step. We focus here on the analysis of tree compression. For tree algorithms, refer to .

### 3.1 Tree compression

The shape of the binary tree is fixed and depends only on k. Vectors are folded in the tree until only tuples remain. These are stored in the leaves. Using hashing, tuples receive a unique index which is propagated back upwards, forming again new tuple in the tree nodes that can be hashed again. This process continues until a tuple is stored in the root node, representing the entire vector.

Figure 3a demonstrates how the state $$\left\langle \bot , 1, 4, \bot , 2, 6 \right\rangle$$ is folded into an empty tree, which consists of $$k-1$$ nodes of empty hash tables storing tuples. The process starts at the root of the tree (a) and recursively visits children while splitting the vector (b). When the leaves of the tree (colored gray) are reached, they are filled with the values from the vector (c). The vectors inserted into the hash tables can be indexed. (We use negative numbers to distinguish indices.) Indices are then propagated back upwards to fill the tree until the root (d).

Using a similar process, we can insert vector $$\left\langle \bot , 1, \mathbf 5 , \bot , 2, 6 \right\rangle$$ (e). The hash tables in the tree nodes extended with index -2 storing

in the left child of the root, while the root is extended with the tuple

. Notice how sub-vector sharing already occurs since the tuple

in the left child of the root points again to

. In (f), the vector $$\left\langle \bot , 1, 4, \bot , 2, 7 \right\rangle$$ is also added. In this case, only the right child of the root needs to be extended while the tuple

With these three vectors in the tree (f), we can now easily add a new vector $$\left\langle \bot , 1, 5, \bot , 2, 7 \right\rangle$$ by merely adding the tuple

to the root of the tree (g). We observe that an entire state vector (of length k in general) can be compressed to a single tuple of integers in the root of the tree, provided that the sub-vectors are already present in the left and the right sub-tree of the root.

### 3.2 Analysis of compression ratios

The tree containing four vectors in Fig. 3g uses 20 “places” (= 10 tuples in tree nodes) to store four vectors with a total of 24 variables. The more vectors are added, the more sharing can occur and the better the compression. We now recall the worst-case and the best-case compression ratios for this tree database. We make the following reasonable assumptions about their dimensions:

• The respective database stores $$n = \left| {V}\right|$$ state vectors of ku-bit variables.

• The size of tree tuples is 2w bits, and w bits are enough to store both a variable valuation (in a leaf) or a tree reference (in a tree node); hence, $$u \le w$$.

• Keys can be stored without overhead in tables.Footnote 4

• k is a power of 2.Footnote 5

Figure 4 provides an overview of the different data structures and the stated assumptions about their dimensions.

To arrive at the worst-case compression scenario (Theorem 2), consider the case where all states $$s\in V$$ have k identical data values: $$V = \{ v^k \mid v\in \{1,\dots ,n\}\}$$, where $$v^k$$ is a vector of length k: $$\left\langle v, \dots ,v \right\rangle$$. No sharing can occur between state vectors in the database, so for each state we store $$k-1$$ tuples at the tree nodes.

### Theorem 2

() In the worst case, the tree database requires at most $$k-1$$ tuple entries of 2w bits per state vector.

The best-case scenario (Theorem 3) is easy to comprehend from the effects of a good combinatorial structure on the size of the parent tables in the tree. If a certain tree table contains d tuple entries and its sibling contains e entries, then the parent can have up to $$d\times e$$ entries (all combinations, i.e., the Cartesian product). In a tree that is perfectly balanced ($$d=e$$ for all sibling tables), the root node has n entries (1 per state), its children have $$\sqrt{n}$$ entries, its children’s children $$\root 4 \of {n}$$, etc. Figure 5 depicts this scenario.

Hence, there are a total of $$n + 2\sqrt{n} + 4\root 4 \of {n} + \cdots (\log _2(k) \hbox {times})\cdots + \nicefrac k2\root k/2 \of {n}$$ tuple entries. Dividing this series by n gives a series for the expected number of tuple entries per state: $$\sum \nolimits _{i=0}^{\log _2(k)-1} 2^{i} \frac{\root 2^{i} \of {n}}{n}$$. It is hard to see where this series exactly converges, but Theorem 3 provides an upper bound. The theorem is a refinement of the upper bound established in . Note that the example above of a tree with the four Bakery algorithm states already represents an optimal scenario, i.e., the root table is the cross product of its children.

### Theorem 3

In the best case and with $$k \ge 8$$, the tree database requires less than $$n + 2\sqrt{n} + \root 4 \of {n} (k-4)$$ tuple entries of 2w bits to store n vectors.

### Proof

In the best case, the root tree table contains n entries and its children both contain $$\sqrt{n}$$ entries. The entries in the four children’s children of the root represent vectors of size $$\nicefrac k4$$. These four tree nodes contain each of the $$\root 4 \of {n}$$ entries that each require $$\nicefrac k4-1$$ tuples taking the worst case according to Theorem 2 (hence also $$k \ge 8$$). $$\square$$

### Corollary 1

() In the best case, the total number of tuple entries l in all descendants of root table is negligible ($$l \ll n$$), assuming a relatively large number of vectors is stored: $$n \gg k^{2} \gg 1$$.

### Corollary 2

() In the best case, the compressed state size approaches 2w.

Table 1 lists the achieved compressed sizes for states, as stored in a normal hash table and a tree database. As a simplifying assumption, we take u to be equal w, which can be the case if the tree is specifically adapted to accommodate u bit references.

### 3.3 Poly-logarithmic-time tree updates by incremental insertion

The Compact Tree trades ku bit vector lookups (in a plain hash table) for $$k-1$$ of 2u-bit tuple lookups in its nodes, assuming $$w \approx u$$. This makes the data structure already constant time with respect to the number of vectors n, as required in Sect. 1. Moreover, the tree requires only few additional data accesses compared to a hash table, i.e., $$\mathcal {O}(ku - 2u)$$ bits (assuming hash table accesses are indeed amortized constant time).

However, far worse for modern computers with steep memory hierarchies is that the Compact Tree makes $$k-1$$ times more random memory accesses compared to a plain table (which, after all, can store key data consecutively in memory). Luckily, we can exploit locality again to speed up tree lookups by keeping the tree pointers of the predecessor state in the search stack (Q), as explained in . Figure 6 illustrates this. The resulting incremental insertion yields a poly-logarithmic query time of $$\mathcal {O}(\log (k)\times u)$$, which is even faster than $$\mathcal {O}(ku)$$ time required by a plain hash table, but still requires $$\log (k)$$ times more random memory accesses (note that both are still constant in n). Nonetheless, in the case of good compression, the lower tables in the tree typically contain fewer entries which can more easily be cached.

## 4 A novel compact tree

The current section shows how a normal tree database can be extended to reach the information theoretical optimum using a compact hash table.

### 4.1 Hash tables and compact hash tables

A hash table stores a subset of a large universeU of keys and provides the means to look up individual keys in constant time. It uses a hash function to calculate an address h from the unique key. The entire key is then stored at its hash or home location in a table T (an array of buckets): $$T[h] \leftarrow \text {key}$$. Because typically $$|U|\gg |T|$$, multiple keys may have the same hash location. These so-called collisions are handled by calculating alternate hash locations and inserting the key there if empty. This process is known as probing. For this reason, the entire key needs to be stored in it, to distinguish which key is currently mapped to a bucket of T (Fig. 7).

Observe, however, that in the case that $$|U|\le |T|$$, the table can be replaced with a perfect hash function and a bit array. Compact hashing  generalizes this idea for the case $$|U|>|T|$$. (The table size is relatively close to the size of the universe.) The compact table first splits a key k into a quotient q(k) and a remainder $$\text {rem}(k)$$, using a reversible operation, e.g., $$q(k) = k \% \left| {T}\right|$$ and $$\text {rem}(k)= k / \left| {T}\right|$$. When the key is $$x = \lceil log_2(|U|)\rceil$$ bits, the quotient $$m = \lceil log_2(|T|)\rceil$$ bits and the remainder $$r = x - m$$ bits. The quotient is used for addressing in T (like in a normal hash table). Now, only the remainder is stored in the bucket. The complete key can now be reconstructed from the value in T and the home location of the key. If, due to collisions, the key is not stored at its home location, additional information is needed. Cleary  solved this problem with little overhead by imposing an order on the keys in T and introducing three administration bits per bucket. For details, see [12, 17, 42]. Because of the administration bits, the bucket size b of compact hash tables is $$b = r + 3$$ bits. The ratio $$\nicefrac bx$$ can approach zero arbitrarily close, yielding good compression. For instance, a compact table only needs five bits per bucket to store $$2^{30}$$ 32-bit keys.

### 4.2 Compact tree database

To create a compact tree database, we replace the hash tables in the tree nodes with compact hash tables.

Let the tree references again be w bits; tuples in a tree node table are 2w bits. The tree node table’s universe therefore contains $$2^{2w}$$ tuples. However, tree node tables cannot contain more than $$2^w$$ entries; otherwise, the entries cannot be referenced (with w-bit indices) by parent tree node tables. As the tree’s root table has no parent, it can contain up to $$2^{2w}$$ entries. Let o be the overcommit of the tree root table $$T_{\text {root}}$$, i.e., $$\log _2(\left| {T_{\text {root}}}\right| ) = 2^{w+o}$$ for $$0 \le o \le w$$. Overcommitting the root table in the tree can yield better reductions as we will see. However, it also limits the subsets of the state universe that the tree can store. Close-to-worst-case subsets might be rejected as the left or right child ($$2^w$$ tuples max) of the root grows full before the root ($$2^{w+o}$$ tuples max).

We will only focus on replacing the root table $$T_{\text {root}}$$ with a compact hash table as it dominates the tree’s memory usage in the optimal case according to Corollary 1. The following parameters follow immediately:

• $$x = 2w$$,(universe bits)

• $$m = w + o$$, (quotient bits)

• $$r = 2w - w - o = w - o$$, and(remainder bits)

• $$b = 2w - w - o + 3 = w - o + 3$$.(bucket bits)

Let the Compact Tree Database be a Tree Database with the root table replaced by a compact hash table with the dimensions provided above, ergo: $$n = \left| {V}\right| = \left| {T_{\text {root}}}\right| = 2^{w+o} = 2^m$$. Theorem 4 gives its best-case memory usage.

### Theorem 4

(Compact tree best case) In the best case and with $$k\ge 8$$, the compact tree database requires less than $$\text {CT}_{\text {opt}} \,\triangleq \,(w- o +3)n + 4w\sqrt{n} + 2w\root 4 \of {n} (k-4)$$ bits to store n vectors.

### Proof

According to Theorem 3, there are at most $$n + 2\sqrt{n} + \root 4 \of {n} (k-4)$$ tuples in a tree with optimal storage. The root table contains n of these tuples, its descendants use at most $$2\sqrt{n} + \root 4 \of {n} (k-4)$$ bits. The n tuples in the root table can now be stored using $$w - o + 3$$ bits in the compact hash table buckets instead of 2w bits; hence, the root table uses $$n(w - o +3)$$ bits. $$\square$$

Finally, Theorem 5 relates the compact tree compression results to our information theoretical model in Sect. 2, under the reasonable assumption that $$8 \le k \le \root 4 \of {n} + 4$$. It shows that $$\text {CT}_{\text {opt}}$$ can approach $$H_{\text {state}}$$ up to a factor $$\frac{w - o +6}{u}$$. As a consequence, when the overcommit ($$o - 6$$ bits) fills the gap of $$w - u$$ bits between the sizes of references in the tree (w bits) and the sizes of variables (u bits), the optimal compression is approached with the compact tree. If $$o-6 > w -u$$, the compact tree can even surpass the compression predicted by our information theoretical model. This is not surprising as the tree with $$k=2$$ reduces to a compact hash table, for which a different information theoretical model holds [17, 39].

### Theorem 5

Let $$\text {CT}_{\text {opt}}$$ be the best-case compact tree compressed vector sizes. We have $$\text {CT}_{\text {opt}} \le H_{\text {state}}$$ provided that $${w - o +6} \le {u}$$ and $$8 \le k \le \root 4 \of {n} + 4$$.

### Proof

According to Theorem 1, $$nH_{\text {state}} \le un + \log _2(k)n + 2n$$ bits. According to Theorem 4, the compact tree database uses at most $$\text {CT}_{\text {opt}} \,\triangleq \,(w-o+3)n + 4w\sqrt{n} + 2w\root 4 \of {n} (k-4)$$ bits in the best case and with $$k\ge 8$$.

We now derive c in $$\text {CT}_{\text {opt}} \le c H_{\text {state}}$$ using the lower bound from Theorem 1.

\begin{aligned}&(w-o+3)n + 4w\sqrt{n} + 2w\root 4 \of {n} (k-4) \\&\quad {\mathop {\le }\limits ^{?}}c\log _2(y-1)n + c\log _2(k-1)n + cn \\&wn-on+3n + 4w\sqrt{n} + 2w\root 4 \of {n} (k-4) \\&\quad {\mathop {\le }\limits ^{?}}c\log _2(y-1)n + c\log _2(k-1)n + cn\\&(w-o-c+3)n + 4w\sqrt{n} + 2w\root 4 \of {n} (k-4) \\&\quad {\mathop {\le }\limits ^{?}}c\log _2(y-1)n + c\log _2(k-1)n \quad [-cn]\\&(w-o-c +3)n + 4w\sqrt{n} + 2w\root 4 \of {n} (k-4) \\&\quad {\mathop {\le }\limits ^{?}}c(u-1)n + c\log _2(k-1)n \quad [r.r.^{}6]\\&4w\sqrt{n} + 2w\root 4 \of {n} (k-4) \\&\quad {\mathop {\le }\limits ^{?}}(c(u-1) - w + o + c - 3)n + c\log _2(k-1)n \quad [-..]\\&4w\sqrt{n} + 2w\root 4 \of {n} (k-4) \\&\quad {\mathop {\le }\limits ^{?}}(cu - w + o - 3)n + c\log _2(k-1)n \\&4w\sqrt{n} + 2w\root 4 \of {n} (k-4) \\&\quad {\mathop {\le }\limits ^{?}}(cu - w + o - 3)n \quad [\hbox {reduce right by }c\log _2(k-1)n)]\\&4w/\sqrt{n} + 2w (k-4) /{n^{\nicefrac 34}} \\&\quad {\mathop {\le }\limits ^{?}}cu - w + o - 3 \quad [\hbox {divide by }n]\\&4w/\sqrt{n} + 2w \root 4 \of {n} /{n^{\nicefrac 34}} \\&\quad {\mathop {\le }\limits ^{?}}cu - w + o - 3 \quad [\hbox {increase left by } n \ge (k-4)^4]\\&4w/\sqrt{n} + 2w/\sqrt{n} {\mathop {\le }\limits ^{?}}cu - w + o - 3 \\&6w/\sqrt{n} {\mathop {\le }\limits ^{?}}cu - w + o - 3 \\&3 {\mathop {\le }\limits ^{?}}cu - w + o - 3 \quad [\hbox {increase left by }w / \sqrt{n} \le 1/2^{}7]\\&w - o + 6 {\mathop {\le }\limits ^{?}}cu \quad [+w - o + 3]\\&\frac{w - o + 6}{u} {\mathop {\le }\limits ^{?}}c \end{aligned}

Taking $$c=1$$, we obtain that $$\text {CT}_{\text {opt}} \le H_{\text {state}}$$ provided that $${w -o + 6}\le {u}$$ and $$n \ge (k-4)^4$$. $$\square$$

## 5 Experiments

We implemented the Compact Tree in the model checker LTSmin . This implementation is based on two concurrent data structures: a tree database  and a compact hash table , based on Cleary’s approach . The parameters of the Compact Tree Table in this implementation are (for details see ):

• $$w = 30$$ bits (The internal tree references are 30 bit)

• $$u = 30$$ bits (The state variables can be 30-bit integers, often less is used)

• $$o = 2$$ bits (The root table fits a maximum of $$2^{32}$$ elements)

LTSmin is a language-independent model checker based on a partitioned next-state interface . We exploit this property to investigate the compression ratios of the Compact Tree for four different input types: DVE models written for the DiVinE model checker , Promela models written for the spin model checker , process algebra models written for the mCRL2 model checker , Petri net models from the MCC contest , and EventB/ProB models from LTSmin ’s ProB frontend [2, 28]. Table 2 provides an overview of the models in each of these input formats and a justification for the selection criterion used. In total, over 400 models were used in these benchmarks.

We compare the Compact Tree against different compressed and uncompressed data structures: a hash table, spin ’s collapse tables , Tries , Binary Decision Diagrams (BDDs) [6, 8] and Multi-Valued Decision Diagrams (MDDs) [35, 41].

All experiments ran on a machine with 128 GB memory and 48 cores: four AMD OpteronTM 6168 processors with 12 cores each.

### 5.1 Compression ratio

Compressed state sizes of our implementation can roughly approach $$w - 2 + 3 = 31$$ bits or $$\pm \,4$$ Bytes by Corollary 1 and Theorem 4. We first investigate whether this compression is actually reached in practice. Figure 8 plots the compressed sizes of the state vectors against the length of the uncompressed vector. We see that for some models, the optimal compression is indeed reached. The average compression is 6.97 Bytes per state. The fact that there is little correlation with the vector length confirms that the compressed size indeed tends to be constant and vectors of up to 1000 Bytes are compressed to just above four Bytes. Figure 9 furthermore reveals that good compression correlates positively with the state space size, which can be expected as the tree can exhibit more sharing.

Only for Petri nets and for DVE models, we find models that exhibit worse compression (between 10 and 15 Bytes per state), even when the state space is large. However, we observed that in these cases, the vector length k is also large, e.g., the two Petri net instances with a compressed size of around 12 have $$k > 400$$. Based on some earlier informal experiments, we believe that with some variable reordering, this compression might very well be improved to reach the optimum. Thus far, however, we were unable to derive a reordering heuristic that consistently improves the compression.

### 5.2 Runtime performance and parallel scalability

In the introduction, we mentioned the requirement that a database visited set ideally features constant lookup times, like in a normal hash table. To this end, we compare the runtime of the DVE models with the spin model checker, a model checker known for its fast state generator.Footnote 8 Figure 10 confirms that the runtimes of LTSmin with Compact Tree are sequentially on par with those of spin, and often even better. We attribute this performance mainly to the incremental vector insertion discussed in Sect. 3 (see Fig. 6). Based on the MCC 2016  results, we believe that LTSmin ’s performance is on par with other Petri net tools as well.

The measured performance first of all confirms that the Compact Tree satisfies its requirements. Secondly, it provides a good basis for the analysis of parallel scalability (if we had chosen to implement the Compact Tree in a slow scripting language, the slowdown would yield “free” speedup). Figure 11 compares the sequential runtimes to the runtimes with 48 threads. The measured speedup often surpasses 40x, especially when the runtimes are longer and there is more work to parallelize. Speedups are good regardless of input language.

### 5.3 Comparison with spin ’s collapse compression

spin ’s collapse compression uses the process structure in the input to fold vectors, similar as in tree compression, but with only one table per process, whereas the tree continues splitting vectors until only tuples are left. The lower bounds reported in the current paper cannot be reached with collapse due to its n-ary tree structure and limit to two levels. Our benchmarks compare the Compact Tree with spin ’s collapse compression in both per state compressed size (see Fig. 12) and total memory use (see Fig. 13). We used all DVE inputs that were translated to Promela and have the same state count in spin as in LTSmin. Both the compression and the total memory use of the Compact Tree improve upon collapse by an order of magnitude.

### 5.4 Comparison with trie compression

Jensen et al.  propose a Trie for storing state vectors. Tries compress vectors by ensuring sharing between prefixes. BDDs  also store state vectors efficiently; however, Jensen et al.  figure them too slow for state space exploration. The tool from  implements reachability with Tries for Petri nets. We compare it to the compact tree in LTSmin in Figs. 14 and 15. We see that its compression correlates positively with the state space size. With its near-optimal compressions for the Petri net models, the Compact Tree provides at least a factor 2 improvement over the Trie. The Trie however exhibits better runtime performance, especially for the small and large state spaces. We suspect that the Trie experiences more caching benefits for small problems and that the hash table probes in the Compact Tree become more expensive for larger ones, as shown in .

### 5.5 Comparison with BDDs

Model checking with BDDs is done semi-symbolically in LTSmin , using its partitioned next-state interface . Each next-state partition represents one action in the underlying modeling formalism. (LTSmin is language independent, but an action in turn can be, e.g., a statement in a process guarded by a program counter.) Second, the model checker creates an empty BDD representing the transition relation for each partition. (The relations are interpreted conjunctively, implementing asynchronous behavior in the input .) These relation BDDs are projected to the variables involved in the underlying action, which in many cases involve just a few variables, e.g., a program counter update and a variable reference/update. Third, starting from the initial state, the model checker fills the relation by calling the next-state function and adding the result to the relation BDD. Because the BDDs are (often) defined over a subset of all variables, the learning terminates after a while and the closure is computed fully symbolically inside the BDDs.

When the reachability procedure converges to a fixpoint, the BDD representing the visited set encodes all reachable states. However, intermediary (non-fixpoint) visited set BDDs might be much larger than the final visited set (as not all subsets are efficiently represented by BDDs). (For this reason, symbolic reachability with BDDs is rather sensitive to the search order used [10, 41].) On the other hand, (compact) tree compression space requirements grow monotonically with the number of inserted vectors, making them insensitive to search orders. (No subset ever takes more space than storing the entire state space.) Therefore, it could be argued that symbolic BDD-based model checking is limited by the largest BDD that needs to be stored during the entire reachability procedure. However, we are strictly interested in compression here—not in investigating the smallest possible intermediary BDDs with different search orders—and hence focus on the final BDD representing all reachable states. Nonetheless, Sect. 5.7 investigates the difference in size of the final decision diagram with the peak intermediary decision diagram.

We compare both the runtime of the model checking procedure and compressed sizes of the visited set as BDDs with those of tree compression. Apart from the intermediary visited set sizes, another point should be raised about this comparison. The semi-symbolic approach might not be as efficient as a fully symbolic procedure, such as found in model checkers such as NuSMV , since LTSmin learns the transition relation on the fly. On the other hand, LTSmin allows for more freedom in the next-state function implementation, e.g., multiplication of integer variables. Therefore, we can expect similar unwieldy BDD sizes for such hard inputs in NuSMV . For optimal results in LTSmin, we ran the symbolic tool with the flags for saturation and variable reordering the following options: Figures 16 and 17 show the result of the comparisons. We observe that the compressed state sizes in BDDs are unrelated to tree compression sizes. The latter are always (slightly) larger than the minimum of four bytes per states, while BDDs can even compress better than that. This can be expected as a single BDD node; the “true” node represents all states. Other large subsets might also be represented efficiently (for example, when there is no correlation between variable values). Nonetheless, the converse is also true as the BDD can explode in size (which we see happening here for two Promela models and mCRL2 ’s process algebraic models).

Runtime, however, does not follow the trend of the compression in BDDs. This is likely because the size of the intermediary BDDs is larger and/or because it may take long before the fixpoint is computed in BDDs. (The BDD operations used for image computation are polynomial in the size of the BDD, i.e., potentially exponentially faster than handling the states in BDD individually. However, many iterations might be needed before the fixpoint is found, so even when intermediary BDDs are small the reachability might take long.)

### 5.6 Comparison with MDDs

LTSmin uses Sylvan as BDD/MDD implementation. Multi-valued decision diagrams are implemented in Sylvan as List Decision Diagrams (LDDs) . Like MDDs, the edges in LDDs represent integer values rather than the Boolean values in BDD edges, thereby often reducing the size for state spaces of software systems. Additionally, LDDs represent the resulting n-ary tree as a binary tree, ala Knuth , to allow sharing between sublists at the same level. (Each level represents one totally ordered variable.)

From Fig. 18, we see that LDDs seem to provide an order of magnitude better compression ratios than BDDs. In fewer cases, the compact tree still beats the LDDs (compared to BDDs). The experiments show that runtimes of the tree and LDDs are harder to compare (see Fig. 19). Looking at the larger models (longer runtimes), we see thought that DVE problems seem better suited for tree compression, while the other input languages tend to verify faster with LDDs.

### 5.7 Decision diagram peak sizes

As mentioned in Sect. 5.5, the size of a decision diagram does not monotonically grow with the size of the set it stores. For this reason, the bottleneck for symbolic reachability using BDDs is the peak-sized decision diagram encountered during the entire procedure. To investigate the impact, we measured the peak decision diagram size for the MDD-based reachability and compared it against the size of the final BDD. For this experiment, we use a saturation search order as it is known to keep the size of the intermediate decision diagrams smallest [10, 41]. Figure 20 shows that the peak sizes can be almost an order of magnitude larger than final sizes; however, for most inputs the impact is not that pronounced.

### 5.8 Comparison with parallel BDDs/MDDs

Figures 21 and 22 show the speedups of BDDs/LDDs. Good speedups are less common than with tree compression (cf. Fig. 11). We do observe however that BDDs scale better than MDDs. The difference in speedup results in a performance advantage for the tree approach, when parallel verification is used, as Figs. 23 and 24 show.

### 5.9 Case study: GARP

To push the envelop in enumerative model checking, we performed a case study using the GARP protocol as implemented by Konnov and Letichevsky . Instantiated with one bridge and two applications, the implementation has $$3.31\mathrm {e}11$$ (331 billion) states according to the symbolic backend of LTSmin, which takes 3.2 h to explore the full state space. Using partial-order reduction , we could fully explore the model with the enumerative multi-core backend  for the first time. (While the model could already be explored symbolically with BDDs, this feat is still of interest as enumerative analysis methods can more efficiently verify certain temporal properties on the fly .)

To this end, we used another machine with 512 GB memory and 64 cores: four AMD OpteronTM 6376 processors with 16 cores each. The model checker was configured to use all available memory by setting:

• $$w = 36$$

• $$o = 7$$

• $$u = 30$$

Table 3 shows the results of the case study. The first thing to note is that the OS-reported memory use is close to the space occupied by filled table buckets in the Compact Tree. This means that the implementation performs according to the design as predicted by the analysis in Sects. 3.2 and 4.2. (In fact, due to likely paging of all allocated table buckets, the OS-reported memory should be closer to the memory allocated by the Compact Tree. And the difference of 41 GB between tree-allocated and OS-reported memory is taken up by the five billion states we measured which were stored in the queues at peak.)

Furthermore, the compressed state size of 32.5 bits per state is close to the best case of $$\approx 32$$ bits as predicted by Theorem 4 according to the above parameters. Interesting to note is that this amount of space used per state is smaller than the space required for a unique state identifier, which takes 35.6 bits given that $$5.2\mathrm {e}10$$ states are stored. This can be explained by the fact that the compact hash table uses the location of buckets as information as discussed in Sect. 4.1.

## 6 Discussion and conclusion

The tree compression method discussed here is a more general variant of recursive indexing , which only breaks down processes into separate tables. Hash compaction  compresses states to an integer-sized hash, but this lossy technique becomes redundant with the compact tree database. Bloom filters  still present a worthwhile lossy alternative using only a few bits per state, but of course abandon soundness when applied in model checking.

Valmari and Geldenhuys  present a data structure similar to Cleary’s .

Evangelista et al.  report on a hash table storing incremental differences of successor states (similar to the incremental data structure discussed in Sect. 3). Partial vectors in the table contain a pointer to one predecessor, and only the initial vector is stored fully. Their partial vectors take $$2u + \log (E)$$ bits, where E is the set of (deterministic) actions in the model. To look up vectors in this database, a state is hashed to a table bucket. Defying our requirement of constant time for lookups, Evangelista et al. reconstruct full states by reconstructing all ancestors (in the worst case, there might be as many ancestors as reachable states). We could not compare to this approach due to lack of an available implementation.

Much like in BDDs , the variable ordering influences the number of nodes in a tree table and thus the compression, as mentioned in Sect. 1. Consider the vector set $$\left\{ \mathbf {i,i,j,j} \mid i,j \in [1\dots N] \right\}$$: Only the root node in a compact tree will contain $$N^2$$ entries, while the leaf nodes contain N entries. On the other hand, we have no such luck for the set $$\left\{ \mathbf {i,j,i,j} \mid i,j \in [1\dots N] \right\}$$. Preliminary research  revealed that the tree’s optimum can be reached in most cases for DVE models, but we were unable to find a heuristic that consistently realizes this.