figure a
figure b

1 Introduction

Boolean functions play a central role in the design and analysis of computing systems. They frequently appear in different representations through logics, circuits, machine learning classifiers, or binary decision diagrams (BDDs) [1, 12]. In particular, BDD representations are appealing as they are strongly normalizing and provide efficient operations such as applying Boolean operators, finding and counting satisfying assignments, or checking equivalence. Applications of BDDs encompass a wide range, including symbolic model checking and logic synthesis [13, 15, 16, 20, 26]. Much work on BDD research and implementations has been conducted during the first two decades after Bryant’s seminal work [12]. This lead to various other types of decision diagrams (DDs) that extend the core principles beyond Boolean functions or improving the efficiency for specific applications. Most prominently, multi-terminal BDDs (MTBDDs) [4, 17] enable pseudo-Boolean function representations, zero-suppressed BDDs (ZBDDs) [32] usually provide more efficient representations for sparse sets than BDDs, or list DDs (LDDs) [9] efficiently encode transition vectors.

The most frequently used BDD libraries that are still considered state-of-the-art are BuDDy [27] and CUDD [40]. They originate from the 90s and do not fully exploit recent scientific advancements and modern design opportunities. Therefore, DDs and in particular BDDs gain more and more attention again, incorporating insights from satisfiability checking [8] but also providing advances in distributed and parallel computation and feature selection algorithms [7, 18, 23, 39]. Sylvan [18] is a more recent BDD library that focuses on multithreaded operators, which however is also entirely written in C. Hence, it requires all memory management to be done manually, in particular challenging in the parallel setting. Manual resource management is one of the common sources for bugs that lead to undefined behavior (UB), a situation where the programming language does not assign any semantics to the code. Consequences of UB are crashing programs or wrong results, the latter particularly being intolerable in verification tools or other critical applications where BDD libraries are commonly employed. Further, while existing libraries provide support for different kinds of BDDs such as MTBDDs or ZBDDs, the inherent lack of genericity in C required specifically tailored implementations. More elaborate extensions, e.g., towards ternary decision diagrams (TDDs) [38], would also require major internal changes in the library implementations.

In this paper, we develop a new DD framework, called OxiDD, to provide the basis for future developments in DD research and technology. As such, OxiDD focuses on easing the implementation of new DD types, providing reusable components commonly used in different kinds of DDs, and relying on modern technology. This leads to the following four major development goals for OxiDD: safety, concurrency, modularity, and performance.

By safety, we mean the absence of undefined behavior. Concurrency refers to thread-safety when used from multithreaded applications on the one hand. On the other hand, the framework itself should leverage multicore architectures for performance. Modularity should already be fulfilled by the nature of a framework, clearly separating concerns and enhancing extensibility. Here, clear interfaces should separate algorithms from data structures and allow to easily replace implementations of a component by another.

We tackle all the four development goals by implementing OxiDD in Rust, which is considered to be a safe programming language. Rust achieves safety via a rich type system but does not compromise performance: usually, Rust programs do not show any runtime overhead compared to C/. Furthermore, Rust allows us to define clear and generic interfaces, as well as efficient implementations of data structures. Also here, genericity does not come with any runtime overhead, as the compiler generates specialized code at compile time. For high performance, we opt into Unsafe Rust, a language syntactically separated from Safe Rust using the unsafe keyword. Unsafe Rust enables a few additional operations whose safety cannot be checked by the compiler. Connecting Unsafe and Safe Rust requires safe abstractions upholding the central soundness property of Safe Rust: “No matter what, Safe Rust can’t cause Undefined Behavior.” [36] The art is to keep the portion of Unsafe Code as small as possible without violating the soundness property. One instance where we need Unsafe Rust is to support reordering of variables without node-wise locking. In this case, designing safe abstractions has been challenging. In the end, however, we gain both performance and implementations of all DD operations entirely in Safe Rust.

Contributions and Outline. We report on generic implementations of BDDs, MTBDDs, ZBDDs, and TDDs in OxiDD, focusing on implementation design and evaluating OxiDD’s performance. Section 2 gives a detailed description of these DD types and enhancements. For working with these DDs from Rust, we provide high-level interfaces similar to those of existing libraries that—in contrast to those—cannot cause UB, and also provide C and bindings. Section 3 goes into more detail about the framework’s architecture and implementation details. We also point out some insights from tuning the data structures for performance. For this, we design safe abstractions, a highly non-trivial process we report on in Section 3.3. In Section 4, we finally evaluate the performance of OxiDD’s BDD implementation. Our results show that OxiDD is on par with existing libraries, and even outperforms them in certain scenarios. This lets us conclude that in OxiDD, safety and modularity do not come at the expense of performance.

Fig. 1.
figure 1

Popular DD libraries

Further Related Work. For an overview comparing the features of popular and recently maintained BDD libraries, see Fig. 1. Here, BCDD refers to BDDs with complemented edges. The standard libraries BuDDy, CUDD, and Sylvan are widely used in several communities due to their manifold BDD manipulation operators and rich functionalities. Besides those, there are various other libraries that mostly provide specialized implementations. Biddy [31] mainly started as an educational implementation but nowadays also supports a wide range of different BDD types such as tagged BDDs [14, 19]. Java implementations such as JDD [41], BeeDeeDee [28], or PJBDD [7] provide better safety properties than C implementations, but usually cannot compete with performance. In case DDs grow beyond the size of the entire main memory, it becomes especially important to reduce the amount of random disk accesses. This is what the external memory libraries Adiar [39] and CAL [37] focus on. Development of CAL ceased back in 1996, but it was recently brought back to life in context of research on Adiar. Biodivine/LibBDD [6] is a notable BDD implementation in Rust and to the best of our knowledge the only Rust library besides OxiDD that supports existential and universal quantification. We are not aware of any DD implementation in the spirit of a modular framework that emphasizes safety as much as OxiDD does, while being concurrent and delivering high performance.

2 Background: Decision Diagrams and Rust

We recall kinds of DDs relevant for this paper, explain the role of variable orders and variable reordering, as well as preliminaries on safe abstractions in Rust.

Fig. 2.
figure 2

Example decision diagrams for Boolean functions \(f, g :\mathbb B^3 \rightarrow \mathbb B\) where \(f(x_0,x_1,x_2) = \lnot (x_1 \vee x_2)\) and \(g(x_0,x_1,x_2) = x_0 \leftrightarrow x_1 \leftrightarrow x_2\)

2.1 Kinds of Decision Diagrams

Decision trees (DTs) are tree-like structures that represent functions through variable-labeled decision nodes and terminal nodes with function outcomes. Each path from the root to a terminal stands for assigning variables with values with the function outcome of the terminal. Decision diagrams (DDs) are rooted directed acyclic graphs that arise from DTs by merging isomorphic subtrees. We assume DDs to be ordered, i.e., variable occurrences follow a given total order on all paths in the DD. The order restriction may also be formulated by assigning each node a level, which we number from top to bottom. Then, a variable order \(\sigma \) is a bijection between the levels \(0, \dots , k-1\) and the \(k\) input variables. Terminals are considered to be on a distinguished level \(\infty \) at the bottom. Then, every node at level \(i\) can only have successor nodes at levels greater than \(i\).

Binary DDs (BDDs). The most prominent kind of DDs are BDDs, used to represent Boolean functions \(f:\mathbb B^k \rightarrow \mathbb B\) over \(\mathbb B = \{\bot , \top \}\). They comprise terminal nodes \(\top \) and \(\bot \) as well as inner nodes \(n\) with outgoing “then” and “else” edges pointing to nodes \(n_t\) and \(n_e\), respectively. By \(n, m, ...\), we usually denote nodes and by \(x_0, x_1, \ldots \) variables. BDDs are usually considered to be reduced, i.e., for any inner nodes \(n, m\) (1) \(n_t \ne n_e\) and (2) if \( level (n) = level (m)\), \(n_t = m_t\), and \(n_e = m_e\) then \(n = m\). One major advantage of such BDDs is that they are strongly normalizing, i.e., they agree up to isomorphism for any Boolean function [21]. Shared BDDs associate function names with nodes, allowing for multiple functions to be represented in a single BDD structure. See Fig. 2a for an example of a (shared reduced) BDD with two functions f and g.

The semantics \([\![ n ]\!]\) of a BDD node \(n\) is recursively defined as a Boolean function. If \(n\) is a terminal, \([\![ n ]\!]\) is a constant function, mapping always to true if \(n = \top \) or false if \(n = \bot \), respectively. If \(n\) is an inner node at level \(i\), then \([\![ n ]\!]\) is \((x_{\sigma (i)} \wedge [\![ n_t ]\!]) \vee (\lnot x_{\sigma (i)} \wedge [\![ n_e ]\!])\), the Shannon decomposition of \([\![ n ]\!]\) w.r.t. \(x_{\sigma (i)}\).

A BDD is typically created by successively applying Boolean connectives to already existing BDDs. As an example, the apply algorithm for conjunctions works as shown in Fig. 3. Here, it is assumed that the get_or_make_node function at the bottom also maintains reducedness, typically implemented using a hash table called unique table [11]. Note that the runtime of a naïve apply_and implementation is exponential in the number of variables of the functions represented by n and m. By applying memoization, the runtime can be reduced to \(\mathcal O(|\texttt {n}| |\texttt {m}|)\), where \(|\cdot |\) denotes the count of descendant nodes. Memoization is typically implemented using a fixed-size cache called apply cache or computed table. The design of combining unique table and computed table towards an efficient BDD implementation was originally proposed by Brace et al.  [11]. Besides apply algorithms based on recursion, there are also breadth-first apply algorithms implemented, e.g., in the BDD libraries CAL and Adiar [37, 39].

Fig. 3.
figure 3

Apply algorithm for conjunctions (pseudocode)

Complement Edges. To reduce the number of nodes in a BDD and to support negation in shared BDDs in \(\mathcal O(1)\), Brace et al. proposed complement edges as a new edge type in DDs [11]. We abbreviate BDDs that contain complement edges by BCDD. The semantics of a complemented edge pointing to a node \(n\) is just \(\lnot [\![ n ]\!]\). To recover a strong normal form, we remove the \(\bot \) terminal node and impose the restriction that a “then” edge is never complemented. The latter forms, besides the two standard conditions on reduced BDDs, the third condition rendering BCDDs reduced. To ensure this condition, any node \(n\) whose “then” edge is complemented can be replaced by a node \(n'\) whose “then” edge is regular. The “else” edge of \(n'\) is the complement of the “else” edge of \(n\) such that \([\![ n' ]\!] = \lnot [\![ n ]\!]\). This means that all nodes that previously referred to \(n\) with a regular edge now have to use a complemented edge to \(n'\) and vice versa. This is the reason why—in contrast to the apply_and in Fig. 3—we formulate all algorithms based on edges (i.e., possibly tagged node references) rather than simple node references. Since functions \(f\) and \(\lnot f\) are represented by a single node, BCDDs may halve the number of nodes compared to BDDs.

Zero-Suppressed BDDs. A function \(f:\mathbb B^k \rightarrow \mathbb B\) may also be interpreted as a characteristic function of a set \(S = \{v \in \mathbb B^k \mid f(v) = 1\} \subseteq \mathbb B^k\). We can even view a Boolean vector as a subset of some “universe” \(U\), so we also have \(S \subseteq \mathcal P(U)\). For example, let \(U = \{a, b\}\). The function \(a\) represents the set of all sets containing \(a\), i.e., \(\{\{a\}, \{a, b\}\}\). Conversely, the set \(\{\{a\}\}\) is represented by the function \(a \wedge \lnot b\). This means that we can use BDDs to represent sets of Boolean vectors or sets of finite sets. If these sets are sparse, however, the corresponding BDD can be very large. Zero-suppressed BDDs (ZBDDs, ZDDs, or ZSDDs), which were introduced by Minato [32], are more apt for this use case. Like BDDs, ZBDDs have inner nodes with two outgoing edges we call hi and lo here. The terminal nodes are \(\varnothing \) (“empty”) and \(\{\varnothing \}\) (“base”). Their semantics is just \([\![ \varnothing ]\!] = \varnothing \) and \([\![ \{\varnothing \} ]\!] = \{\varnothing \}\). For an inner node \(n\) at level \(i\), we have \([\![ n ]\!] = [\![ n_{\texttt{lo}} ]\!] \cup \{x_{\sigma (i)} \cup \alpha \mid \alpha \in [\![ n_{\texttt{hi}} ]\!]\}\). To ensure reduced ZBDDs, a different first condition than BDDs is imposed: While for all nodes \(n\) in BDDs its children should represent different functions, i.e., (1) \(n_t \ne n_e\), in ZBDDs we require that the node itself and the lo-node should represent different functions, i.e., (1’) \(n_{\texttt{hi}} \ne \varnothing \).

Multi-Terminal BDDs (MTBDDs). While BDDs only contain two terminal nodes \(\bot \) and \(\top \), MTBDDs allow for arbitrary finitely many terminals [17]. Hence, MTBDDs can represent functions \(\mathbb B^k \rightarrow S\), where \(S\) is an arbitrary set. A prominent application for MTBDDs is in symbolic probabilistic model checking [5] where \(S=[0,1]\). To allow such infinite sets, terminal nodes are usually created on demand, ensuring finiteness due to finitely many inner nodes of the MTBDD. MTBDDs are also known as algebraic decision diagrams (ADDs) [4].

Multivalued DDs (MDDs). Representing functions \(D_0 \times \cdots \times D_{k - 1} \rightarrow S\) imposes implementation challenges. For finite domains \(D_i\) we could rely on a binary encoding and resort to (MT)BDDs, then also called finite domain decision diagram (FDD). However, the properties of such FDDs heavily depend on the chosen bit-blasting encoding of the domains. As an alternative, MDDs directly encode multiple values as multiple outgoing edges [25]. Just like in MTBDDs, there is one terminal node per (used) value of \(S\). Ternary decision diagrams (TDDs) may be viewed as one instance of MDDs, where \(D_0 = \dots = D_{k - 1} = S = \{\bot , ?, \top \}\). That is, TDDs represent functions of three-valued logic [38].

2.2 Reordering

The size of a DD—no matter of which kind—may heavily depend on its variable order. There are functions \(\mathbb B^k \rightarrow \mathbb B\) where different variable orders can lead to node counts in the class of \(\varTheta (2^k)\) but also \(\varTheta (k)\). Determining whether a variable order is suboptimal itself is an NP-complete problem [10], but there are heuristics to derive a good variable order from a (propositional) formula describing the function [2, 34]. However, there are applications where such a formula is not available in advance. Furthermore, building the BDD for some intermediate result may require a different variable order than building the final BDD. In such cases, it is possible to reorder the existing DD, e.g., using Rudell’s sifting algorithm [35]. The core of this algorithm is to pick a variable, try out all positions for it, and then move it to the best position. This procedure is repeated until no improvement is made.

There are various other reordering algorithms, but moving a variable to another position usually boils down to swapping all nodes of adjacent levels. Key characteristics of variable swap are that the semantics of nodes is preserved, and the operation can be performed in-place, i.e., locally. This is crucial, because nodes at levels \(i\) and \(i + 1\) may be referenced by many nodes at higher levels. To explain the swap operator, we restrict ourselves to BDDs for simplicity. Let \(n\) be a node initially at level \(i\) where at least one of \(n_t\) and \(n_e\) is initially at level \(i + 1\). The semantics of \(n\) then depends on both, the upper variable \(x = \sigma (i)\) and the lower variable \(y = \sigma (i + 1)\). Hence, \(n\) is essential at level \(i\), redirecting the edge to \(n_t\) towards a node for \([\![ n ]\!][y {:}{=}\top ]\) (i.e., \([\![ n ]\!]\) with \(y\) set to true) and \(n_e\) towards a node for \([\![ n ]\!][y {:}{=}\bot ]\). If new children already existed and the old children have no incoming edges anymore, the node count decreases. Otherwise, it is well possible that the node count stays the same or even increases.

2.3 The Power of Safe Abstractions

Rust’s central soundness property, “No matter what, Safe Rust can’t cause Undefined Behavior” [36], is very powerful. In general, while software components may seem sound in isolation, their composition can still cause UB. This is because computer-checkable interface specifications, e.g., function types, are usually too limited to capture all conditions required to prevent UB.

For Safe Rust, the situation regarding UB—notably including data races—is different. Due to the soundness property, we can be sure that any composition of components either does not cause UB or is forbidden by the type system. While this translates to peace of mind for the user, it also requires a soundness argument for every piece of unsafe Rust code. For instance, the following unsafe code is unsound:

Inside the unsafe block, we dereference a raw pointer, which is an unsafe operation. The unsafety arises from the fact that dereferencing a dangling pointer has no defined semantics. Now, we would need to argue why ptr cannot be dangling. But, any pointer can be passed to bad_deref, so the code is unsound.

To remedy this issue, the function must be marked unsafe as well, so that it cannot be called from Safe Rust. Note that this now requires the use of unsafe by the caller. To prevent the entire code base from becoming infected with unsafe, a safe abstraction is required. For instance, the Box type in Rust’s standard library encapsulates a raw pointer and maintains the safety invariant that this pointer is always safe to dereference. As the pointer itself is inaccessible from the outside, this invariant cannot be violated and Box can thus provide a safe method for dereferencing it. The safety of this method is established entirely by local reasoning on the Box type and its safety invariant.

3 Architecture and Implementation

OxiDD’s architecture is highly modular. In Rust, crates serve as counterparts to packages in languages such as Python, OCaml, or Haskell. OxiDD’s implementation is split into multiple crates, to encapsulate functionality and expose a public versioned API. Fig. 4 shows how OxiDD is decomposed into separate crates and their dependencies on each other. Each crate has its own well-defined purpose.

Fig. 4.
figure 4

OxiDD’s architecture: dependency graph of the main crates.

The architecture is centered around the core crate that mainly consists of trait definitions which formalize the key concepts of DDs. Traits are Rust’s equivalent to interfaces or abstract classes in object-oriented programming. By using traits for abstracting from concrete implementations of key concepts, OxiDD achieves its high degree of modularity. Notably, there are no dependencies between algorithms and concrete implementations of data structures, all algorithms and data structures are written in a generic way. To provide end users with default implementations, e.g., towards the use of OxiDD as a BDD library, there is the oxidd crate, which assembles standards that have been shown useful in practice.

3.1 The OxiDD Framework

Instead of being yet another DD library, its modular architecture is what makes OxiDD a framework (Fig. 4). Different implementations can be composed and swapped out for alternatives. All functionality has clear interfaces and can be separated into individually maintained and versioned crates. Third-party contributors can easily develop crates for novel kinds of DDs, core data structures, or reordering heuristics. Facilitated by OxiDD’s abstractions, those crates will work seamlessly together, thereby making it ideal for future research on DDs. In this section, we provide further details on key concepts of this framework.

Manager. The manager is the data structure that stores all nodes of a DD and ensures their uniqueness via a unique table [11]. It also provides functionality for delayed garbage collection (GC), where the removal of nodes is delayed as far as possible. Early removal of nodes would lower performance, if nodes need to be recreated. An implementation of the manager trait also defines an edge type. An edge is a reference to a node, and may additionally have a tag. Tags are used, e.g., to mark edges as complemented in BCDDs. An inner node consists of its outgoing edges and optionally a level number. The latter is required for most kinds of DDs but can be omitted, e.g., in quasi-reduced BDDs.

OxiDD allows for different manager implementations. The manager-index crate contains a manager implementation that uses 32-bit unsigned integers to represent edges. These 32-bit are split into an index referencing a node and a tag. If \(2^{32}\) nodes are too limiting for a use case, it is well possible to implement a different manager, e.g., one where nodes are referred to by pointers. In Fig. 4, this is indicated by the dashed manager-​pointer box.

Cache. Typically, each manager has an associated apply cache, which is required by our recursive apply implementations for DD manipulation. Notably, the architecture of OxiDD is also open to other implementations, e.g., for a breadth-first apply algorithm (cf. [37, 39]). The cache crate provides an apply cache as a fixed-size hash table. As with managers, alternative implementations of the apply cache are possible, and they can be freely composed with other implementations of the core infrastructure, e.g., managers.

Functions. Recall that shared DDs represent functions of various types (cf. Section 2), represented in a single data structure. In the graphical DD representation (cf. Fig. 2), functions correspond to the boxed \(f\)s and \(g\)s. From an implementation perspective, functions are an edge paired with a reference to the manager storing the respective node. For end users, functions provide a convenient interface for creating and manipulating DDs.

Support for Various Kinds of DDs. The apply algorithms for the different DD kinds are implemented in the crates starting with rules. Besides the reduction rules, these crates also define terminal node and edge-tag types. Depending only on the abstractions provided by the core crate, other kinds of DDs can easily be implemented. Notably, implementations are also shielded from UB as they can be implemented entirely in safe code.

Reordering. OxiDD provides the fundamental mechanism of swapping levels in DDs for variable reordering (cf. Section 2). Currently, the reorder crate implements functionality to establish a given variable order, e.g., harmonize variable orders of different DDs or impose a static variable order heuristic. Support for dynamic reordering, e.g., via sifting [35], is planned for OxiDD’s next release.

End User Ergonomics. While achieving a high degree of modularity through abstraction, this does not come at the expense of developer ergonomics for end users. Fig. 5 shows an example for constructing a manager, creating three variables, building the expression \((x_1 \wedge x_2) \vee x_3\), and then checking satisfiability. Here, 2048 and 1028 are the capacities of the manager for nodes and the apply cache, respectively, and 8 is the number of threads to use (see Line 1).

The method with_manager_exclusive is used to obtain exclusive access to the manager, required for creating variables. As existing libraries, OxiDD offers functions for applying operators (Line 7) or checking satisfiability (Line 8). Note that the interfaces provided by OxiDD shield from UB, whether caused by memory mismanagement or data races. Therefore, Fig. 5 does not contain a single line of unsafe code. The question marks ? are part of Rust’s mechanism for handling errors, which may happen, e.g., when running out of memory.

Fig. 5.
figure 5

Constructing a BDD for \((x_1 \wedge x_2) \vee x_3\) with OxiDD’s API.

3.2 Design Choices and Defaults

Implementing OxiDD, we also focused on providing a good set of default implementations, selected and tuned for performance.

Node Store. The manager-​index implements a store for inner nodes as an array, consisting of an initialized part followed by an uninitialized part. Each element of this array may either be a node along with a reference counter, a free slot with a reference to the next free slot, or uninitialized (see Fig. 6).

Fig. 6.
figure 6

Node store array (binary nodes with level).

When creating a new node, we first check if the linked list of free slots contains an element. If yes, this element is removed from the list and the node is stored there. Otherwise, the first uninitialized slot is used. Should there be no uninitialized slot in the array, then we return an out-of-memory error. When deleting a node, we prepend the node’s slot to the list of free slots.

In a concurrent setting, both the first-uninitialized index and the free slot list head are shared state requiring synchronization. To prevent contention, every worker thread gets its own first-uninitialized index and free slot list. Instead of incrementing the shared first-uninitialized index by \(1\), the worker pre-allocates the slots until the next multiple of \(2^{16}\). The free slot list is then split into multiple lists of (approximately) \(2^{16}\) elements. The shared state maintains an array of these lists, while the workers have just one of these lists. If GC reaches \(2^{16}\) nodes for a worker, the local list is moved to the shared state. The large lists avoid frequent synchronization with the shared state and thus contention.

Terminal nodes are managed independently of inner nodes. To distinguish between inner and terminal nodes, we split the 32-bit “address space” into two parts. The first \(N\) node IDs are used for terminal nodes, the remaining ones for inner nodes. The actual array index is the node ID minus \(N\). For example, we set \(N = 2\) in case of BDDs, where ID 0 is used for \(\bot \), and ID 1 for \(\top \). Determining the value of a terminal node does not require any memory operation here. For MTBDDs, however, we have to store terminal nodes in a separate array, similar to the inner node store described above.

Reference Counting. For GC, we use reference-counting instead of a mark-and-sweep method. One reason for this design decision is in level-local GC used for reordering. Iterating through the entire DD for mark-and-sweep GC is very expensive. It would be possible to only materialize reference counters during reordering and use mark-and-sweep GC otherwise (implemented, e.g., in BuDDy [27]). However, this does not resolve the following issue: GC must not remove any objects that are referenced by local variables. In languages like C, , and Rust, we cannot simply inspect the program stack. BuDDy resolves this issue using a second stack to register all locally referenced objects there. The problem is that accidentally forgetting the registration may lead to use-after-free bugs and ultimately UB. This would imply that apply algorithms need to be written in Unsafe Rust, which is undesirable. Some solutions to this problem have been discussed [22], but have no advantage over plain reference counting in case of DDs. Our preliminary benchmarks indicate that the amount of runtime spent on reference counting is in the order of 5 %. Given that mark-and-sweep GC would probably not be zero cost either, this seems acceptable.

Unique Table. The unique table is split into multiple hash tables, one per level. This split is useful for reordering, where we need to iterate over all nodes on a level. Since we need to grow these tables on demand, we protect each table with a lock. The hash tables in use are designed with cache locality in mind. In particular, we use linear probing to resolve hash collisions. For space efficiency, the tables only contain IDs of the respective nodes, and not the nodes’ outgoing edges. To improve performance when resizing the hash table, which normally requires rehashing all nodes, we store the hash next to the node ID. Thus, we can avoid rehashing any nodes. We further truncate the hash to 31 bits, so we can use the same 32-bit integer to mark the bucket as empty or as tombstone.

Apply Cache. For the apply cache, we use a fixed-size hash table. Each entry consists of the operator ID, a fixed-size array of operands, and the result of the operation. To synchronize accesses on the table, we use a spinlock per bucket. On usual lookups, we do not wait in case another thread has the lock, we rather recompute the entry. When inserting a new entry, we always replace a previously present entry in the bucket. We also experimented with bucket sizes larger than one entry and replacement policies such as first in, first out (FIFO), and least frequently used (LFU), but these turned out to be slower than the direct-mapped apply cache. One reason might be that in our benchmarks, we generally observed rather few cache hits (in the order of 20-30 %). Larger bucket sizes would require checking more entries before concluding that an entry is not contained in the cache. In addition, FIFO and LFU do not account for the different costs of operations. Ideally, the apply cache would merely keep those entries that take more time to recompute and are also used frequently. We plan to investigate such a strategy in more detail in future work. Notably, such experiments are facilitated by the modular architecture of OxiDD.

A particular important optimization is to elide reference counter updates when inserting or removing entries from the apply cache. This is due to referenced nodes rarely being in the CPU cache. Eliding reference counter updating implies that we must ensure that no nodes are deleted while referenced from the apply cache. Nodes can only be deleted during GC and reordering. Since a GC may run in background, we lock and empty all buckets of the apply cache prior to the GC. Only after the GC, we unlock the buckets again.

Concurrent Apply Algorithms. OxiDD has recursive apply algorithms, both in a single-threaded and a concurrent version. The concurrent version uses task-based parallelism with work-stealing, similarly to Sylvan [18]. The idea is to execute the recursive calls (cf. Fig. 3) concurrently. For the implementation, we use the rayon crate [30]. As splitting the work into tasks comes with a runtime overhead (in the order of +35 %), we only split the tasks until a certain recursion depth. From then on, we use the single-threaded apply algorithm.

3.3 Safe Abstraction for Modifying Nodes

A challenge when designing OxiDD was to find a safe abstraction for modifying nodes, e.g., during reordering, as it requires synchronization. A lock per node would lead to incorrect results when accessing nodes subject to a level swap, and moreover be diametral for performance. Instead, we use a single read/write lock to coordinate exclusive access to the entire DD. A shared append-only view is sufficient for apply algorithms and most other operations such as model counting or satisfiability checking. Reordering requires exclusive access.

Once exclusive access is acquired, we must ensure that all nodes we modify actually belong to the respective manager we have exclusive access on. To this end, a safety invariant is required: All descendants of a node are stored in the same manager. This is a very natural assumption, also needed for correctness, avoiding a node in manager A to reference a node of manager B. As this invariant is needed for safety, there must not be a way to violate it from Safe Rust. The challenge is that when creating a node, there is no efficient way to check the invariant. After all, we only work with edges here, and edges do not (necessarily) provide any information about the manager the node belongs to. Only the function type stores both a node reference and a reference to the respective manager. So, before actually starting an apply operation, we must ensure that the operands (of function type) belong to the same manager, and the entire code in between needs to uphold the invariant. In a naïve implementation, without a proper abstraction, this would require a lot of unsafe code.

We can drastically reduce the amount of unsafe code if every manager has its own edge and node types, as this prevents mixing edges from different managers. To realize this idea without fixing the number of managers upfront, we use branded types as presented by Yanovski et al.  [42]. Branded types leverage Rust’s lifetimes. In Rust, a reference is essentially a pointer with the invariant that it is always safe to dereference. As references may point to stack variables, the compiler needs to make sure that the referenced variables do not go out of scope as long as the reference is live. This is done by adding a lifetime to reference types. The lifetime corresponds to the referenced variable’s scope.

Fig. 7.
figure 7

Usage of branded types.

As an example, computing the conjunction of functions func_a and func_b works as in Lines 1-5 of Fig. 7. The with_manager_shared method acquires the lock (for shared access) of the manager referenced by func_a. Further, it takes a closure to which it passes the manager reference and edge. This is the place where the new brand/lifetime is introduced. We denote it as in the comment. When converting func_b into its underlying edge in Line 2, we check that it belongs to manager. If this is not the case, we abort the execution with an appropriate error message. Otherwise, we obtain an edge of the same branded type as edge_a. This means that when calling the recursive apply_and function, it can safely assume that the nodes referenced by edge_a and edge_b, as well as all their descendants are stored in the same manager. This simply follows from type safety. As the branded type is only valid inside the closure, we convert the resulting edge back into a function in Line 4. Notably, if we nest with_manager_shared calls as shown in Lines 6-12 of Fig. 7, we get a compile time error because the types of edge_a and edge_b have different brands. This safe abstraction enables the implementation of apply algorithms entirely within safe code.

4 Evaluation

OxiDD is designed not only for modularity and safety, but also with performance in mind. We (mostly) use zero cost abstractions and eliminate runtime checks via type invariants. Our evaluation is driven by two research questions:

  • RQ1 How does the single-threaded runtime of OxiDD compare to other popular BDD libraries?

  • RQ2 Can OxiDD achieve similar speed as Sylvan in the multithreaded setting?

As the set of libraries we compare against, we choose BuDDy 2.4, CUDD 3.0.0, and Sylvan 1.8.0 since these are the most popular libraries. Furthermore, we compare against LibBDD 0.5.10, a relatively mature Rust library, and Adiar (commit ca4f7351), which apparently is the most performant external memory library in the large scale. The version of OxiDD corresponds to commit 8113c12. Among this set of libraries, Sylvan and OxiDD are the only multithreaded libraries. For a fair comparison, we integrated OxiDD into the bdd-​benchmark frameworkFootnote 1 initially developed by Steffan Sølvsten for the evaluation of Adiar [39]. It contains the following set of combinatorial and verification benchmarks:

  • \(N\)-Queens: Given \(N \in [12, 15]\), how many ways are there to place \(N\) queens on an \(N \times N\) chess board without threatening each other?

  • Tic-Tac-Toe: Given \(N \in [20, 24]\), how many ways are there for player 1 to place \(N\) crosses in a 3D \(4 \times 4 \times 4\) cube and tie if player 2 places naughts in all remaining positions?

  • Picotrav: Given a hierarchical circuit, a BDD is constructed for each output. We use this to verify the equality of two circuits. In our case, the circuits are a subset of the EPFL combinational benchmark suite [3].

Input sizes and files are selected based on preliminary experiments regarding resource consumption. Note that complement edges are not beneficial for \(N\)-Queens and Tic-Tac-Toe: Negations occur on variables only, the remaining operations are just conjunctions and disjunctions. This is different for Picotrav.

bdd-​benchmark is designed in a way that is generic over the respective BDD library. All benchmarks are written against an abstract adapter that provides operations such as conjunction, disjunction, and negation in case of BDDs. This means that the same operations are executed with the same variable order, regardless of the DD library in use. In particular, dynamic reordering is disabled. Note that bdd-​benchmark is written in , so OxiDD’s adapter makes use of the bindings. All libraries except BuDDy use complemented edges. Only OxiDD implements both BDDs and BCDDs, as the genericity easily allows us to do so. Since both implementations are based on the same data structures, we also get a relatively good estimate of the performance impact complemented edges have. For the remainder of this section, we use “OxiDD” to refer to the BDD implementation and explicitly add “BCDD” otherwise.

We executed the benchmarks on a 16 core / 32 thread AMD Ryzen 9 5950X CPU with 128 GiB of RAM and approximately 800 GiB free SSD space, running Ubuntu 22.04 (Linux kernel 5.15). The libraries were compiled using Clang 16.0.6 or rustc 1.71.1, which are both based on LLVM 16. We set a timeout of 3 hours. To reduce the number of TLB misses during execution, we enabled transparent hugepages by setting /sys/kernel/mm/transparent_hugepage/enabled to always. The default on many systems is that programs have to issue respective madvise calls. OxiDD is the only library that does this to some extent. The performance impact of this setting is quite large: In preliminary experiments we observed a \(1.6 \times \) speedup for 14-Queens with BuDDy. We ran each benchmark three times and report the average running times.

Fig. 8.
figure 8

N-Queens and Picotrav benchmark statistics

4.1 RQ1: Single-thread Performance

Overall, our benchmarks show that for single-threaded execution, BuDDy performs best. OxiDD is slightly slower and faster than all other libraries. In Fig. 8a, we show the runtimes on the \(N\)-Queens benchmark relative to OxiDD. For \(N = 12\), OxiDD takes 4.2 s, 24.6 s for \(N = 13\), 2.4 min for \(N = 14\), and 16.1 min for \(N = 15\). On 15-Queens, OxiDD performs best. BuDDy runs out of memory, mainly due to its limitation to \(2^{31}-1\) nodes. As the BDD construction produces more than \(2^{31}\) nodes, this only works with sufficiently many GCs. OxiDD (BCDD) is restricted to \(2^{31}\) nodes (the last bit is needed for complement edges), and the GCs cause OxiDD (BCDD) to be much slower than OxiDD in this specific benchmark instance. Still, OxiDD (BCDD) is faster than CUDD, Sylvan, and LibBDD. For this problem size, breadth-first apply algorithms also start to shine. Adiar is only \(1.03 \times \) slower than OxiDD.

The situation is very similar for the Tic-Tac-Toe problem. For Picotrav, however, complement edges may have a notable impact on the node count. On many instances, the BCDD variant of OxiDD performs slightly better than its BDD variant and BuDDy, see Fig. 8b. All libraries solved the smallest 21 out of 23 instances, the remaining two timed out or ran out of memory.

So with respect to RQ1, we can say that OxiDD is among the best libraries. However, a manager implementation that is not restricted to \(2^{31}\) or \(2^{32}\), respectively, might be interesting for some use cases.

4.2 RQ2: Multi-thread Performance

From Fig. 8e, we observe that OxiDD’s parallelization is already effective in its initial release. However, for increasing number of threads, Sylvan performs better. This is probably due locking on each level in the unique table of OxiDD leads to contention. 14-Queens has 196 variables/levels, so it is not that unlikely that two out of 32 threads try to acquire the same lock. Notably, OxiDD’s performance for 32 threads is slightly worse than for 16 threads. Especially for the smaller Picotrav instances (cf. Fig. 8d), we also observe a significant slowdown using 32 threads. Sylvan shows a slowdown as well, but not as serious as OxiDD. Only for the largest solved instance, Sylvan has a significant speedup of \(10.5\times \) for 32 threads (cf. Fig. 8f).

Regarding RQ2, we conclude that Sylvan’s highly optimized parallel engine leads to better performance on a high numbers of threads. In large combinatorial problems with at most 16 threads, OxiDD’s parallelization outperforms Sylvan’s. For the verification problems we tested, the current implementation does not achieve parallel speedups. Still, we remark that OxiDD in single-threaded execution outperforms the multithreaded Sylvan significantly in all but one Picotrav instance. Note that OxiDDs parallelization can still be optimized, e.g., by using concurrent hash tables countering contention issues mentioned (cf. Section 3.2).

5 Conclusion

In this paper, we have presented OxiDD, a new decision diagram framework in Rust. OxiDD emphasizes on modularity, which eases extension on functionalities and new kinds of decision diagrams. Our implementations benefit from high performance and can safely be used in concurrent contexts. Depending on the workload, there may also be significant speedups in multithreaded execution. We demonstrated this by comparing OxiDD’s B(C)DD implementations to other popular BDD libraries. Moreover, we showed how we can leverage Rust’s type system to ensure that edges from different managers cannot accidentally be mixed up. This allowed us to implement the building blocks for dynamic reordering while keeping the apply algorithms entirely in Safe Rust.

Aiming at the basis for future research and developments, there are plenty of opportunities. First, OxiDD’s B(C)DD, MTBDD, and ZBDD implementations are not yet as feature-rich as matured BDD packages such as CUDD. Adding the remaining operations is, however, facilitated by our modular design. Second, we pointed out that the current unique table is likely to be a bottleneck for concurrent performance. Recently, there have been interesting developments on growing concurrent hash tables [29], which we plan to further investigate. Third, we plan to implement dynamic reordering heuristics relying on our reordering building blocks presented here. Last but not least, the argument that our unsafe code upholds Rust’s invariant is currently informal. Formally verifying OxiDD would be a challenging but rewarding avenue to pursue.