Keywords

figure a
figure b

1 Introduction

Computational systems have become more and more ubiquitous in our daily life and manifest themselves in various contexts, including VLSI circuits, software programs, and cyber-physical systems. To construct reliable systems, quality assurance has become an indispensable research topic. Numerous endeavors have been invested for different computational systems. Because of the ever-increasing system complexity and applications in safety-critical missions, it is of vital importance to take advantage of all available solutions for different types of systems to guarantee the quality and correctness.

Formal verification and testing are two active fields of research to analyze and assure the quality of computational systems. The former decides with mathematical rigorousness whether a system conforms to a specification. The latter aims at generating input patterns and executing a system on a test suite to observe irregular output responses. Studies for formal verification or testing usually focus on a specific computational model, especially a sequential circuit (hardware) or a program (software). Tool competitions are also established based on modeling languages for input instances, such as the language Btor2  [64] used in the Hardware Model Checking Competitions (HWMCC) [28, 29], or the language C assumed by the Competitions on Software Verification (SV-COMP) [11, 14] and Testing (Test-Comp) [12, 13]. Unfortunately, such distinction erects a barrier between the two closely related research communities.

1.1 Our Motivations and Contributions

For the hardware community to easily benefit from state-of-the-art software-analysis techniques, we aim at developing a lightweight yet effective translation flow to bridge the gap between hardware and software analysis. There have been several attempts [48, 62] to compile hardware designs into software, mostly using the language Verilog as the input format. Verilog is a general-purpose hardware description language, and thus, a comprehensive frontend for Verilog requires tremendous engineering effort. Moreover, Verilog has rather complicated syntax and semantics, which might increase the burden on the translation flow.

To address the complexity in the frontend design, we resort to the language Btor2  [64], proposed recently to model word-level sequential circuits. A suite Btor2Tools [63] of utility tools is also provided for conveniently parsing, simulating, and bit-blasting (to the bit-level format Aiger  [26]) Btor2 circuits. We emphasize the following two benefits of using Btor2 as the translation frontend over Verilog. First, Btor2 provides simple yet sufficient operations over bit-vectors and arrays. The simplicity makes it an appropriate intermediate representation for formal verification and testing, as the operations are suitable for the underlying satisfiability solvers. Second, Btor2 is the input format used in the HWMCC. Many hardware model checkers support this format, and a large collection of benchmarking tasks is available for empirical evaluation. In practice, a Verilog circuit can be translated to Btor2 via Yosys  [70], an open-source Verilog synthesis tool. Therefore, using Btor2 as frontend does not restrict the applicability of the translation flow.

Having settled down the frontend choice, our next question is: Should we make software analyzers support Btor2, or should we implement a standalone translator that does the job for all tools? We take the latter approach such that any software analyzer (from 76 available [25]) can in principle be used for hardware analysis. As opposed to using Verilog as frontend, the simplicity of the Btor2 language helps to generate C programs suitable for the backend analysis, as will be shown in Sect. 5 via comparison with the Verilog-to-C translator v2c  [62].

Once a handy translator is viable, we are enthusiastic about empirically comparing hardware and software analyzers on a large scale. Similar experiments have been carried out for bounded [60] and unbounded [61] formal verification on a small set of circuits. By building a translator on top of the Btor2 language, more than a thousand benchmarking tasks from the HWMCC are at our immediate disposal. To draw a more reliable conclusion on the performance comparison of state-of-the-art hardware and software analyzers, we evaluate bit-level and word-level hardware model checkers from HWMCC, software verifiers from SV-COMP, and software testers from Test-Comp, on the HWMCC benchmark set.

Fig. 1.
figure 1

Software analysis made readily available for hardware designs

Our contributions in this paper are summarized below:

Novelty. (1) To bridge the gap between hardware and software analysis, we design and implement Btor2C, the first hardware-to-software compiler taking the format Btor2  [64] as input. Specifically, Btor2C accepts a Btor2 circuit and produces a behaviorally equivalent C program. Given a Verilog design, Btor2C (with the help of Yosys) makes off-the-shelf software verifiers and testers readily available for its analysis. In addition to bit-level and word-level analyzers, hardware developers will be equipped with more tool choices to perfect their designs, as shown in Fig. 1. (2) Btor2C makes it easy to construct new hardware analyzers by prepending the translator in front of any software analyzer. (3) Applying Btor2C to the HWMCC benchmark set, we submitted 1224 new tasksFootnote 1 to sv-benchmarks, the benchmark collection used by many researchers, including SV-COMP and Test-Comp. Developers of software analyzers can now assess their tools using the hardware-analysis counterparts as a new baseline.

Significance. (1) We conduct a large-scale evaluation involving hardware model checkers, software verifiers, and software testers on the HWMCC benchmark set. Our results show that software-analysis techniques can complement hardware model checkers. (2) The proposed lightweight translator makes software analyzers more accessible to the entire research community, as Btor2 can be used as an intermediate representation for analysis, not limited to hardware designs.

1.2 Example

Fig. 2.
figure 2

An example Btor2 circuit (a) and its translated C program (b)

Figure 2 illustrates the proposed translator Btor2C on an example. A circuit whose state is a bit-vector of width 3 is given in Btor2 format in Fig. 2a. The bit-vector is initialized to 0 (lines 2-4). In every iteration, the value of the bit-vector will be incremented by the value of the external input (lines 5-6) and then decremented by 1 (lines 7-8). The circuit reaches a bad state (i.e., violates the safety property) if the value of the bit-vector equals 0b111 (lines 12-13). The translated C program is shown in Fig. 2b. Btor2C first looks for the sorts used in the input Btor2 file. In this example, bit-vectors of 3 bits and 1 bit are used, and Btor2C encodes them with the shortest possible unsigned integer type unsigned char (lines 4-5). After sort declarations, Btor2C defines constants, declares inputs, and initializes circuit states (lines 6-10). An infinite loop is created to simulate the behavior of a sequential circuit. At the beginning of the loop, the safety property is evaluated. If the property is violated (namely, variable bad_13 evaluates to \( true \)), the program reaches the error location at line 17. Otherwise, the next-state value (stored in variable var_8) is computed and assigned to the current state (lines 19-23), and another loop iteration follows. After the translation, we can apply software verifiers to the translated program in Fig. 2b to check whether the circuit in Fig. 2a conforms to the specified safety property.

2 Related Work

2.1 Compiling Hardware to Software

Several research efforts [48, 68] have been invested into representing a circuit as a program, whose primary goal is to accelerate hardware simulation. The most related work to ours is the Verilog-to-C translator v2c  [62], used to translate hardware circuits into software programs for bounded [60] and unbounded [61] formal verification. Unlike v2c, our translator uses as frontend the Btor2 language, which is simple to parse and suitable for analysis. In Sect. 5, we compare the performance of software analyzers on C programs generated by v2c and our tool Btor2C.

2.2 Compiling Hardware to Intermediate Representation

Another line of research related to our work is the compilation of hardware to an intermediate representation that eases the burden of analysis. The motivation of these works is to interface real-world designs and problems described in a more abstract language with tools that use a primitive model representation. Our tool Btor2C shares a similar spirit because it interfaces problems in hardware analysis with software techniques. Among other tools, Verilog2SMV  [51] and Ver2Smv  [59] translate a Verilog circuit into SMV format [34, 56], which can be verified by tools like nuxmv  [33]. QuteRTL  [71] translates a register-transfer-level hardware design (usually in Verilog or VHDL) to Btor  [31], an earlier version of Btor2. EBMC  [55] generates SMT formulas in SMT-LIB 2 format [8], which encode the bounded model checking or k-induction problems of a Verilog circuit. Yosys  [70], which translates a Verilog circuit into the Aiger or Btor2 formats, also serves the same purpose. Recently, there has been an interest to develop an intermediate language for the model-checking research community [67]. The project aims at providing an expressive frontend language as well as an efficient interface with backend model checkers.

3 Background

3.1 The Btor2 Language

Btor2 is a bit-precise modeling language for word-level sequential circuits. It can be seen as a generalization of the bit-level Aiger format [26]. The essential ingredients of Btor2 relevant to our discussion in Sect. 4 will be introduced below. For the complete syntax, please refer to the Btor2 publication [64].

Each line in a Btor2 file starts with a unique number, used by other lines to identify the entity defined in this line. Such an entity can be either a sort or a node. A sort is either a bit-vector type of an arbitrary width w, denoted by \(\mathcal {B}^{w}\), or an array type. An array type whose indices and elements are bit-vector types \(\mathcal {I}\) and \(\mathcal {E}\), respectively, is denoted by \(\mathcal {A}^{\mathcal {I}\rightarrow \mathcal {E}}\). A node can be an input, a state, or a result of an operator over other inputs, states, or results. Inputs are external stimuli given to the Btor2 circuit. Memory elements of the circuit are modeled by states. Usually, inputs have bit-vector types, and states can be of either bit-vector or array types.

Operators are the building blocks of a Btor2 circuit. They take arguments of the prescribed types and guarantee a specific type for the result. The general signature for a Btor2 operator is as follows: , which defines a node to be the computation result of the operator op on node id1 and optionally id2 and id3. The result will have type id0 and can be accessed by id. The operators in Btor2 will be introduced later in Sect. 4 alongside the translation process of Btor2C.

Btor2 also provides constructs like init, next, and bad to describe the safety-reachability problem for sequential circuits. Initial and bad states can be defined by init and bad, respectively. The transition from one state to another is captured by next. In the following, we briefly recap sequential circuits and their model-checking formulation.

3.2 Sequential Circuits and Hardware Model Checking

A sequential circuit is a computational model widely used in the design and analysis of hardware. It consists of a combinational circuit and memory elements. The combinational circuit is in charge of the computation, and the memory elements store the circuit’s state. The combinational circuit is a directed acyclic graph whose vertices are logic gates and edges are wires connecting the gates. If the output pin of gate u is connected to an input pin of gate v, we say that u is a fan-in of v, and v is a fan-out of u.

The computation of sequential circuits is segmented into consecutive time frames. Before the first time frame starts, the memory elements are typically reset (described by init). At the beginning of each time frame, the combinational circuit reads the values stored in the memory elements and receives stimuli from the environment. The former is called the current state of the circuit, and the latter is called the external input in this time frame. Propagating the current state and external input through its logic gates, the combinational circuit computes the output response and the new values to be stored in the memory elements (namely, next-state values, described by next). At the end of the time frame, the next-state values are saved into the memory elements, which become the current state for the next time frame.

The model-checking problem of reachability safety for hardware is formulated as follows: Given a sequential circuit and a safety property (usually encoded as an output of the sequential circuit’s combinational part, described by bad), decide whether the safety property holds on all executions of the sequential circuit. If the property does not hold on some execution, a hardware model checker generates an input sequence to trigger the output, and the sequential circuit is deemed unsafe with respect to the property. Otherwise, the sequential circuit is considered safe, and a model checker might additionally generate (an overapproximation of) the set of reachable states as correctness witness.

3.3 Software Model Checking

The reachability-safety problem for software is formulated similarly as hardware model checking. Given a program and a safety property (usually labeled as an error location in the program), determine whether there is an executable program path that reaches the error location. Although, unlike hardware, software model checking is in general undecidable, many research efforts have been invested into automated solutions to this problem [10, 19, 53], including predicate abstraction [5, 42, 47, 50], counterexample-guided abstraction refinement (CEGAR) [6, 36], and interpolation [49, 58]. The verification of industry-scale software such as operating-systems code [4, 7, 23, 32, 37, 54] is made feasible together by these solutions and the advances in SMT solving [9]. It is our research enthusiasm to explore how these concepts work on hardware.

4 Translating Btor2 to C

This section describes the proposed translator Btor2CFootnote 2, implemented in the language C with approximately 1600 lines of code. We first describe the general idea of using C programs to simulate sequential circuits, whose behavior is intrinsically concurrent. The implementations of various Btor2 operators and optimizations in Btor2C are discussed later.

4.1 Simulating Sequential Circuits with C Programs

Sequential circuits work in a concurrent manner: The external input and current state propagate in parallel through the combinational circuitry to produce circuit outputs and next-state values. In contrast, the C programming language is imperative, and hence C programs are generally executed line-by-line.

Fig. 3.
figure 3

A generic program to imitate sequential circuits for reachability safety

To capture the behavior of sequential circuits in the context of reachability safety, Btor2C generates C programs with the generic single-loop program in Fig. 3 as a template. In the generic program, the sorts and constants used in the sequential circuit are defined at the beginning of the main() function. Second, the program initializes the circuit’s states. An endless loop is then used to mimic the state-transition behavior of the circuit throughout time frames: When a loop iteration begins, the safety property is evaluated over the current state and external input. If the property is violated, the program exits with an error. Otherwise, the next-state values are computed and stored into the state variables. This generic program reflects the reachability safety for sequential circuits.

The commented blocks in the generic program have to be replaced by C instructions to encode the concurrent computation of the sequential circuit. Btor2C assigns every node in the input Btor2 circuit a unique variable in the translated C program. Nodes used for state initialization, state transition, or safety properties, are specified by keywords init, next, or bad, respectively. For such a node, a backward depth-first traversal is applied to collect its transitive fan-in cone to avoid irrelevant signals regarding model checking. Multiple bad keywords in a Btor2 file are translated to multiple error labels in the C program.

4.2 Variable Naming

We use the unique identification numbers for lines in a Btor2 file to name their corresponding variables in the translated C program. Suppose the unique ID of a line is n. If the line defines a sort, it is named SORT_n in the C file. If the line defines a state or an input, it is named state_n or input_n, respectively. If the line defines a node used for state initialization, transition, or property evaluation, it is named init_n, next_n, or bad_n, respectively, to honor the keywords init, next, or bad. For the rest of the nodes, we name their variables var_n in the C file.

4.3 Expressing Btor2 Sorts in C

The language Btor2 supports two sorts: bit-vectors and arrays. Whenever possible, Btor2C represents a bit-vector type \(\mathcal {B}^{w}\) by the shortest unsigned-integer type whose number of bits is greater than or equal to w. For example, a \(\mathcal {B}^{3}\) type with sort ID n is encoded by typedef SORT_n unsigned char;, and a \(\mathcal {B}^{20}\) type with sort ID m is encoded by typedef SORT_m unsigned int;. A Btor2 bit-vector type can have an arbitrary width. If a Btor2 circuit uses a bit-vector type longer than 64 bits, Btor2C cannot translate it to a C program, because no C type can accommodate the bit-vectorFootnote 3. The missing capability to handle bit-vectors longer than 64 bits is a restriction of Btor2C, but the sacrifice is worthy: By encoding bit-vectors with integer variables, native C operators can be directly applied to implement Btor2 operators, which greatly simplify the analysis of translated programs. As can be seen in Sect. 5, the state-of-the-art software verifiers and testers have a decent performance on the translated programs. In practice, only \(20\,\%\) of the collected Btor2 benchmarking circuits have bit-vectors longer than 64 bits, so we consider the restriction acceptable.

For Btor2 arrays, Btor2C represents them by static arrays. Suppose the sort ID for an array type \(\mathcal {A}^{\mathcal {I}\rightarrow \mathcal {E}}\) is n. Let its index type \(\mathcal {I}\) be \(\mathcal {B}^{w}\) and element type \(\mathcal {E}\) be encoded by SORT_m. Then \(\mathcal {A}^{\mathcal {I}\rightarrow \mathcal {E}}\) is encoded by the following C instruction: , which means SORT_n is an array with \(2^w\) objects of type SORT_m.

4.4 Implementing Btor2 Operators in C

The language Btor2 provides various operations, most of which can be easily implemented by the corresponding C operators. Recall that we extend to the next unsigned-integer type to encode a bit-vector type \(\mathcal {B}^{w}\). As a result, there might be some spare most-significant bits (MSBs) in an unsigned-integer variable. Normally, these bits have to be set to zeros (namely, the computation result is modulo \(2^w\)) after each operation to guarantee the precision. Later in Sect. 4.5, we discuss the possibility of performing the modulo operation to results lazily only when needed, instead of applying it eagerly after each operator. Such laziness helps to generate shorter C programs and provides an opportunity for software analyzers to work more efficiently. In the evaluation, we will also compare the effects of these two translation schemes. Next, we follow the order of Table 1 in the Btor2 paper [64] to introduce the Btor2 operators and their implementations in C.

Indexed Operators. Unsigned- and signed-extension operators uext and sext can be implemented by type casting during the variable assignment. The bit-slicing operator slice is implemented by first right-shifting the number of sliced least-significant bits and masking the spare MSBs to zeros.

Unary Operators. The bitwise negation operator not is implemented by its counterpart in C. The arithmetic operators inc, dec, and neg are implemented using the ++, , and - operators in C. The reduction operator redand (resp. redor) is implemented by comparing the operand to \(2^w-1\) (resp. 0) for an operand of type \(\mathcal {B}^{w}\). As there is no native support in C to compute the sum of all bits modulo 2 (parity) in an integer variable, the reduction operator redxor is implemented by repeatedly shifting and XOR-ing the variable with itself, such that the result will end up in the least-significant bit.

Binary Operators. For bit-vectors, the (in)equality operators eq, neq, gt, gte, lt, and lte are implemented by the corresponding C operators. For arrays, the equality operator is implemented by looping the two input arrays to find a different element. Bitwise operators and, or, and xorFootnote 4 and arithmetic operators add, mul, div, rem (remainder), and sub are all supported in C and can be directly implemented using the respective C operators. In the language Btor2, the result of division-by-zero is defined to be the maximum number of the operands’ sort. Our translation takes this specification into account to generate equivalent C programs. Otherwise, division-by-zero would be considered as undefined behavior in C.

Shifting operators sll (logical left shift) and srl (logical right shift) are implemented by the left- and right-shifting operators in C, respectively. According to the ISO C18 standard [52], the result of right-shifting a negative value is implementation-defined. Therefore, to ensure the intended behavior of the arithmetic right-shift operator sra, we always pad ones directly to the resulting value if the given operand is negative (i.e., MSB equals 1). In this way, we do not have to assume any specific implementation of the software verifiers.

Concatenating and rotating operators concat, rol (rotating left), and ror (rotating right), are not natively supported in C. We implemented them by shifting and bitwise disjunction. For example, in order to concatenate node \(n_1\) of type \(\mathcal {B}^{3}\) and node \(n_2\) of type \(\mathcal {B}^{5}\), we use , assuming var_1 and var_2 are of type unsigned char.

The read operator for array types, which takes an array and an index, is simply implemented by C’s syntax to access an array.

Ternary Operators. The if-then-else operator ite works both for bit-vectors and arrays. It is implemented by the ternary operator exp1 ? exp2 : exp3 in C.

The write operator takes an array, an index for where to write, an element for what to write, and returns an updated array. It is implemented using the standard syntax in C to modify the content of an array.

Note that in a Btor2 file, a line with operator write essentially creates a new copy of the original array with one updated element. The original array is not replaced, because it might also be referred to by other lines. In principle, if no lines access the original array after a write operation, the operation could modify the element in place without allocating a new array. For now, Btor2C always copies a new array during a write operation for simplicity.

4.5 Applying Modulo Operations Lazily

Observe that there are some operators that can work correctly without precise operand values, which offers us the opportunity to apply modulo operations lazily and save some computations in translated programs. For instance, consider the addition operator. If \(a_1 \equiv a_2 \pmod {n}\) and \(b_1 \equiv b_2 \pmod {n}\), we conclude that \(a_1+b_1 \equiv a_2+b_2 \pmod {n}\) according to modular arithmetic. In other words, the addition operator does not need precise operands and works correctly for modular numbers (i.e., equivalence classes modulo n). By contrast, other operators might yield different results for modular numbers. For example, \(a+kn>b\) does not guarantee \(a>b\) when \(k>0\). Therefore, performing the modulo operation to the result of an operator is only necessary where the result is used in another operator that requires precise operand values.

Btor2C provides an option for the lazy application of modulo operations. If the option is turned on, Btor2C analyzes whether the precise value is required for each node by looking at the node’s fan-outs. If any of its fan-outs needs the precise computation result of the node, the modulo operation will be applied to it. Otherwise, the modulo operation will be skipped, and the result could be a modular number of the precise value. Operators that require precise operand values mainly include inequalities as well as indices for reading and writing arrays. As an example, if we enable the lazy behavior to translate the Btor2 circuit in Fig. 2a, the modulo operations in line 13 and line 20 of the program in Fig. 2b can be omitted, because input_5 and var_6 are used only in addition and subtraction, which do not need precise operand values.

4.6 Discussion

Correctness of the Translation. As will be seen in Sect. 5, the reliability of Btor2C is empirically validated over a large input set: Most software verifiers obtain consistent answers on the translated C programs as the hardware verifiers. For Btor2 models that violate the safety property, the violation witness generated by software verifiers can be transformed to that of the original Btor2 circuit as a certificate of the translation process. The Btor2Tools utility suite offers a simulator to check the transformed witness against the Btor2 model.

Limitations. The current version of Btor2C has no support yet for the translation of fairness constraints (keyword fair), liveness properties (keyword justice), and overflow detection (keywords addo, divo, mulo, and subo). In our evaluation, only supported keywords appear in the collected Btor2 circuits.

5 Evaluation

We evaluate the claims presented in Sect. 1.1 using the following research questions:

  • RQ1: How do software analyzers perform on hardware-verification tasks?

  • RQ2: Can software analyzers complement hardware model checkers?

  • RQ3: What is the effect of the optimization in Sect. 4.5 on the verification of the translated C programs?

  • RQ4: How effective is the proposed translator Btor2C in comparison with the Verilog-to-C translator v2c  [62]?

To answer the above research questions, we evaluated the state of the art of hardware and software analyzers over a large benchmark set consisting of more than thousand hardware-verification tasks.

5.1 Benchmark Set

We collected hardware-verification tasks in both Btor2 and Verilog formats from various sources, including the benchmark suites used in the 2019 and 2020 Hardware Model Checking Competitions [29] and the explicit-state model-checking tasks derived from the BEEM project [65]. The whole benchmark set as well as a complete list of sources are available in the reproduction artifact [16] of this paper. We also contributed a set of verification tasks to the sv-benchmarks collection, the largest freely available benchmark set of the verification and testing community.

As the proposed translator Btor2C uses Btor2 as frontend, we translated tasks in Verilog to Btor2 with Yosys  [70]. An aggregate of 1912 Btor2 tasks were collected. We excluded 414 tasks with bit-vectors longer than 64 bits, because Btor2C cannot translate these tasks into standard ISO C18 programs. Out of the remaining 1498 Btor2 tasks, 1341 use only bit-vector sorts, and the remaining 157 tasks manipulate both bit-vector and array sorts. The bit-vector category contains 473 unsafe tasks (with a known specification violation) and 868 safe tasks (for which the specification is satisfied). The array category contains 17 unsafe and 140 safe tasks.

We translated the remaining 1498 Btor2 tasks into C programs by the proposed tool Btor2C (tag tacas23-camera), assuming the LP64 data model. The 1341 tasks in the bit-vector category are also translated to Aiger by the translator Btor2AIGER, which is provided in the Btor2Tools utility suite. The original Btor2 models as well as the translated C programs and Aiger circuits are available in the reproduction package [16] and onlineFootnote 5.

Unfortunately, Btor2AIGER does not translate Btor2 circuits with array sorts to Aiger. In our benchmark set, translating a Btor2 file to either a C program or an Aiger circuit took less than a second. Therefore, we ignore the translation time in the run-time of compared tools. An input task with the required format is directly given to each tool. To facilitate the comparison with v2c, we additionally gathered 22 C programs translated by v2c from its repositoryFootnote 6.

5.2 State-of-the-Art Hardware and Software Analysis

To adequately reflect the state of the art of hardware and software analysis, we evaluated the most competitive tools from the Hardware Model Checking Competitions and Competitions on Software Verification and Testing. A wide range of analysis techniques implemented in these tools were investigated in our experiment. Due to space limitation, Sect. 5.4 will show the best configuration of each tool on our benchmark set.

Hardware Model Checkers. For hardware analysis, we selected the state-of-the-art bit-level model checker ABC  [30] (commit a9237f5Footnote 7) and AVR  [46] version 2.1, a word-level hardware model checker that won HWMCC 2020. The former takes Aiger circuits as input, and the latter directly consumes Btor2 models. We evaluated the implementations of bounded model checking (BMC) [27] and property directed reachability (PDR) [41, 45] in both ABC and AVR. Interpolation-based model checking (IMC) [57] in ABC and k-induction (KI) [69] in AVR were also assessed.

Software Analyzers. For software verifiers, we enrolled the first, second, and fourth ranked verifiers VeriAbs  [2], CPAchecker  [20], and Esbmc  [43] of category ReachSafety in SV-COMP 2022. The 3rd ranked verifier PeSCo  [66] was omitted because it selects algorithms from the CPAchecker framework. All verifiers were downloaded from the archiving repositoryFootnote 8 of the competition. (For Esbmc, the performance of an earlier version in SV-COMP 2021 was better than the latest version on our benchmark set, so we used the older version instead.) We tried the implementations of loop abstraction (LA) [38] in VeriAbs; predicate abstraction (PA) [18, 50], Impact  [24, 58], and IMC [21] in CPAchecker; BMC and KI [17, 18, 39, 44] in both CPAchecker and Esbmc.

For software testers, the overall winner FuSeBMC  [3] of Test-Comp 2022, which implements fuzz testing (fuzzing), was picked. We also experimented with other testers from the competition, but they failed to generate test suites on our benchmark set. FuSeBMC was downloaded from the archiving repositoryFootnote 9 of the competition.

In the following discussion, we use \(\langle \textit{tool}\rangle \text {-}\langle \textit{algorithm}\rangle \) to denote the implementation of a specific algorithm in a particular tool. For example, AVR-KI refers to the k-induction implementation in AVR.

5.3 Experimental Setup

All experiments were conducted on machines running Ubuntu 22.04 (64 bit), each with a 3.4 GHz CPU (Intel Xeon E3-1230 v5) with 8 processing units and 33 GB of RAM. Each task was limited to 2 CPU cores, 15 min of CPU time, and 15 GB of RAM. We used BenchExecFootnote 10 [22] to ensure reliable resource measurement and reproducible results.

Table 1. Summary of the results for hardware and software verifiers (suffixes -e and -l stand for applying modulo operations eagerly or lazily, respectively)
Fig. 4.
figure 4

Quantile plots for all correct proofs and alarms of bit-vector tasks

Fig. 5.
figure 5

Quantile plot comparing bug hunting (with BMC) on bit-vector tasks

5.4 Results

RQ1: Solving HW-Verification Tasks with SW Analyzers. To study the performance of software analyzers on hardware-verification tasks, we compared the selected software tools against the state-of-the-art hardware model checkers. The results are summarized in Table 1.

Note that some software verifiers are good at finding bugs in these tasks. VeriAbs found most correct alarms in the experiment, and Esbmc also detected more bugs than AVR. By contrast, hardware model checkers were better at computing correctness proofs. Even the best software configuration CPAchecker-PA for proving correctness only achieved fewer than a half of the proofs for bit-vector tasks. In the array category, AVR delivered 45 correct proofs, whereas the software verifiers cannot solve any of them. Our results may inspire tool developers to investigate and alleviate the performance difference. Since we have contributed a category ReachSafety-Hardware of verification tasks to the common benchmark collection, the 2023 competition results of SV-COMP include evaluations of all participating tools on those new tasks.

The quantile plots of correct proofs and alarms for bit-vector tasks are shown in Fig. 4a and Fig. 4b, respectively. A data point (xy) in the plots indicates that there are x tasks correctly solvable by the respective tool within a CPU time of y seconds. In our experiments, ABC is the most efficient and effective tool in producing proofs, and VeriAbs is the best for bug hunting. While the number of alarms found by Esbmc is more than AVR and close to ABC, it spent more time in finding bugs in general.

In our evaluation, we observe that PDR is the most competitive algorithm for both hardware model checkers, whereas software verifiers show diverse strengths in different approaches. To account for the difference in algorithms, we also compare implementations of the same algorithm in various analyzers.

BMC is one of the most popular formal approaches to detect errors. It is implemented by most of the evaluated tools. Software testers are also able to hunt bugs, and hence we include FuSeBMC, a derivative of Esbmc that combines BMC and fuzzing, into the comparison. Figure 5 shows the quantile plot of correct alarms for unsafe bit-vector tasks. Note that the performance of BMC implementations in software verifiers are close to those in hardware verifiers. However, FuSeBMC performed not as well as other competitors, indicating that fuzzing might not be fruitful for our benchmark set.

We also performed a head-to-head comparison of the k-induction implementations in AVR and Esbmc over the bit-vector and array tasks. Both tools rely on SMT solving for formula reasoning, so the confounding variables are fewer than other combinations. Figure 6 shows the scatter plots for the CPU time and memory usage of AVR and Esbmc to produce correct results. A data point (xy) in the plots indicates the existence of a task correctly solved by both tools, for which Esbmc took x units of the computing resource and AVR took y units. AVR was often more efficient than Esbmc, but the latter solved 13 tasks that the former cannot solve.

Fig. 6.
figure 6

CPU time (left) and memory (right) consumption of AVR-KI and Esbmc-KI

RQ2: Complementing HW Model Checkers with SW Analyzers. Overall, hardware model checkers performed better than software analyzers on our benchmark set, which is expected since they have been heavily optimized for hardware-verification tasks. However, comparing the results of the tools for Table 1, we observed 43 tasks that were uniquely solved by software verifiers. Interestingly, 39 of these uniquely solved tasks have a violated property. Combining BMC with loop unwinding heuristics, e.g., the technique implemented in VeriAbs  [2], is helpful to find bugs in these tasks. This phenomenon demonstrates that software-analysis techniques are able to complement hardware model checkers, which is facilitated by the proposed Btor2C translator. Some potential reasons affecting the effectiveness and efficiency of software analyzers will be discussed in Sect. 5.5.

RQ3: Optimization in Btor2C. Section 4.5 presented an optimization technique that performs modulo operations to intermediate results lazily, in order to generate shorter C programs. To assess whether this technique benefits the downstream software analysis, we compared the performance of the selected software verifiers, CPAchecker, Esbmc, and VeriAbs, on C programs translated by Btor2C with or without this optimization (namely, applying modulo operations lazily or eagerly, respectively).

The results of the best-performing algorithm for each tool in terms of the number of correct answers are summarized in Table 1, whose right panel also shows the results of the verifiers on these 2 sets of C programs. (CPAchecker-BMC actually solved more tasks than CPAchecker-PA, but it was mainly for bug hunting. Therefore, we reported the second best configuration, predicate abstraction, for CPAchecker.) If modulo operations are applied lazily instead of eagerly, the numbers of overall correct results are increased by roughly 2.2 % for both CPAchecker and Esbmc, and by 0.3 % for VeriAbs. Although VeriAbs found 4 fewer correct proofs if modulo operations are applied lazily, it reported 5 more correct alarms. Therefore, we conclude that generating shorter C programs by reducing modulo operations is an effective optimization in Btor2C. From now on, Btor2C enables this optimization by default.

RQ4: Comparison with v2c. Btor2C is a lightweight tool, whose compiled binary is smaller than 0.25 MB. By contrast, the precompiled v2c executable downloaded from its web archiveFootnote 11 is 5.7 MB. While such difference is negligible given the capability of modern computers, we believe that a simple frontend language benefits tool implementation.

Besides implementation complexity, we also investigated the efficiency of the translation process. As mentioned in Sect. 5.1, Btor2C took less than a second to translate any Btor2 model in the benchmark set. Unfortunately, neither the v2c executable in the archive was runnable, nor was its source code compilableFootnote 12. Therefore, we were not able to directly compare the translation efficiency of Btor2C and v2c.

Table 2. Results for 22 programs generated by Btor2C and v2c

As an alternative, we collected 22 C programs from \({\text{ v2c }}\)’s benchmark repository and manually adapted them to the syntax rules used in SV-COMP. The original Verilog circuits of these C programs were translated to Btor2 by Yosys and further translated by Btor2C into another set of C programs. We compare the performance of the evaluated software verifiers on these two sets of 22 verification tasks in Table 2. Observe that the three verifiers produced more correct results on the C programs generated by Btor2C, showing the benefit of using Yosys +Btor2 as frontend in the translation flow.

5.5 Discussion

From the experimental results shown above, we observe a notable performance difference between software and hardware analyzers. There are several possibilities to explain this outcome: First, the tasks were encoded in different formats for software and hardware analyzers. Btor2C encoded bit-vectors with unsigned integer types, which may contain some spare bits that complicate software analysis. Second, each analyzer uses a different backend logical solver. ABC encodes queries in propositional logic and uses SAT solving, while other tools resort to first-order formulas and SMT solving. (In our experiments, AVR used Yices2  [40], CPAchecker used MathSAT5  [35] for predicate abstraction and Boolector3  [64] for BMC, and Esbmc used Boolector3.) The ability of solvers may affect the analyzers’ performance. Third, the internal modeling used by the analyzers varies. Software verifiers typically represent a program as a control-flow graph, which might be unnecessarily complex when the problem at hand is merely a state-transition system. Despite the above reasons, software verifiers were able to solve 43 tasks that the considered hardware model checkers cannot solve.

6 Conclusion

Assuring the correctness of computational systems is challenging yet imperative. Therefore, we should embrace every opportunity to analyze our systems by removing the barriers between research communities. We implemented the lightweight and open-source tool Btor2C for translating sequential Btor2 circuits to C programs, to enable the application of off-the-shelf software analyzers to hardware designs. We conducted a large-scale experiment including more than thousand verification tasks. State-of-the-art bit-level and word-level model checkers as well as software verifiers and testers were evaluated empirically. Thanks to the simplicity of the Btor2 language, software analyzers performed decently on the translated programs and complemented the hardware model checkers by detecting more bugs and uniquely solving 43 tasks in our experiment. Our translator Btor2C demonstrates a new spectrum of analysis options to hardware developers and verification engineers. The translator also simplifies the construction of a new set of hardware analyzers, because any software analyzer can now be used to solve hardware-verification tasks, with Btor2C as preprocessing. In the future, we wish to bridge the gap from the other direction. That is, we aim at translating programs into circuits and apply hardware analyzers to solve software problems.