figure a
figure b

1 Introduction

Model checking plays an influential role in modern hardware design [4]. Its great success is inseparable from propositional methods such as Binary Decision Diagrams (BDDs) [10] and Boolean SATisfiability (SAT) solver [14]. Since BMC [6] was introduced, influential hardware model checking methods such as IMC [20], IC3 [9], and CAR [18] are all SAT-based. At the same time, many important efforts have been made to apply SAT-based model checking techniques to word-level verification tasks whose background theory are first-order logic [7, 11, 16, 19, 23]. These works all rely on more expressive reasoning techniques, i.e., Satisfiability Modulo Theories (SMT) [3] solvers. As the performance of the SMT solvers continues to improve [1, 22], word-level hardware model checking has become a promising research area. Word-level reasoning is more powerful and opens up many possibilities for simplification [5]. It is strong evidence that a word-level model checker, AVR  [17], achieved the best results in the most recent hardware model checking competition [2].

Implementing word-level reasoning tools such as SMT solvers and word-level model checkers is much more complex and difficult than bit-level tools. For word-level model checking, which is a developing and immature area, it is an urgent requirement to obtain a large number of diverse benchmarks that can be used for bug finding and performance evaluation. Responding to this requirement, we present FuzzBtor2, a fuzzing tool that can generate random word-level model checking problems. We choose Btor2  [21] as the format of output files, which is simple, line-based, and easy to parse. Btor2 is also the current official format for the hardware model checking competition [2]. Most of mainstream word-level model checkers support Btor2 format directly (AVR and Pono  [19]) or indirectly (nuXmv  [11] and IC3ia  [13]). To evaluate whether FuzzBtor2 is practical, we test two state-of-the-art word-level model checkers AVR and Pono that can read Btor2 files directly via Btor2 files generated by FuzzBtor2, and generated test cases trigger various errors of both checkers. We expect that FuzzBtor2 becomes infrastructure for the development of word-level model checkers.

2 Word-Level Model Checking and Btor2 Format

We assume that the reader is familiar with standard first-order logic terminology [3]. Words generally refer to terms with bit-vector ranges, optionally combined with other theories. The background theory of Btor2 is the Quantifier-Free theory of Bit Vectors with Arrays extension (QF_ABV), by which almost all computer system information can be encoded. And the invariant property is (one of) the most important property classes to verify.

A model checking problem consists of a transition system and a property to verify. A transition system is a tuple \(S=(V,I,T)\) where

  • V and \(V'\) are sets of variables in the present state and next state respectively;

  • I is a set of formulas corresponding to the set of initial states;

  • T is a set of formulas over \(V\cup V'\) for the transition relation.

Given a transition system \(S=(V,I,T)\), its state space is the set of possible variable assignments. I and T determine the reachable state space of S. The bad property is represented by a formula \(\lnot P\) over V. A model checking problem can be defined as follows: either prove that P holds for any reachable states of S, or disprove P by producing a counterexample. In the former, the system is safe, and in the latter, the system is unsafe. There are input variables in some transition systems, which can be modeled as state variables whose corresponding next states are unconstrained. Assume that a Btor2 file includes \(n_s\) state variables, \(n_c\) constraints, and \(n_b\) bad properties. Its initial state space consists of \(n_s\) init-formulas. The transition relation consists of \(n_s\) next-formulas and \(n_c\) constraint-formulas. And the bad property consists of \(n_b\) bad-formulas. The sorts of init-formulas and next-formulas should be consistent with the corresponding state variables, and constraint-formulas and bad-formulas are Boolean sort.

3 The FuzzBtor2 Tool

FuzzBtor2 is an open-source software consisting of approximately 2400 lines of C++11 code. FuzzBtor2 does not rely on specific libraries and it is self-contained. In this section we introduce the usage and architecture of FuzzBtor2. The tool is available at https://github.com/CoriolisSP/FuzzBtor2.

3.1 Usage

The command to execute FuzzBtor2 in Linux systems is ./fuzzbtor [options]. We present the usage and features of FuzzBtor2 along with the options here.

--seed INT This option is used to set the seed for the random number generator. Keeping other options, we could generate different test cases by changing the value of the random number seed. The default seed is 0.

--to-vmt Verification Modulo Theories (Vmt) [12], which is an extension of Smt-Lib2 [3], is also used to represent symbolic transition systems and the properties to verify. vmt-tools  [15] is a tool suite for Vmt format, and it provides a translator from Btor2 to Vmt. However, vmt-tools supports only a subset of operators in Btor2. By this option, the generated Btor2 files only include the operators supported by vmt-tools, so that they can be translated into Vmt format to test model checkers that take Vmt files as input (e.g., IC3ia  [13]).

--bv-states INT, --arr-states INT These options specify the numbers of bit-vector and array state variables. The default values are 2 and 0 respectively.

--max-inputs INT This option specifies the maximum number of input variables in the generated Btor2 file. The actual number of input variables in the generated file may be smaller than the maximum. The default value is 1.

--bad-properties INT, --constraints INT These two options specify the numbers of bad properties and constraints in the generated Btor2 file, and the default values are 1 and 0 respectively. The fuzzer currently does not support generating liveness properties and fairness constraints.

--max-depth INT A word-level model checking problem consisting of a transition system and properties to verify is essentially a set of first-order logic formulas. And formulas are represented by syntax trees in FuzzBtor2, so a word-level model checking problem corresponds to a set of syntax trees. This option specifies the maximum depth of these syntax trees. The default value is 4.

--candidate-sizes RANGE|SET FuzzBtor2 can get a set of positive integers from this option, which is used to specify sorts of variables. All sizes of indexes of array variables, elements of array variables, and sizes of bit-vector variables are in the set. The default set is \(\{s\in \mathbb {Z}\mid 1\le s\le 8\}\). Note that it does not allow to define a specific sort directly.

3.2 Architecture

The architecture of FuzzBtor2 consists of preprocessor, generator, and printer. Users of FuzzBtor2 only specify some arguments on the command line, and no other input is given. From command line arguments, the preprocessor sorts out the information required by the generator and saves it as a configuration. According to the configuration, the generator constructs some syntax trees that satisfy requirements of the number and sorts as stated in Sec. 2. These syntax trees encode a set of first-order logic formulas, which essentially is a model checking problem independent of the Btor2 format. At last, the printer outputs syntax trees constructed by the generator in Btor2 format.

figure c

The generator is the key component of FuzzBtor2. The generator constructs a syntax tree recursively, that is, a syntax tree with a depth greater than 1 consists of sub-syntax trees, operators, and some possible parameters (only for indexed operators). When the recursive process reaches the base case, i.e., a leaf node of the syntax tree, it randomly decides to return a (state or input) variable or a constant based on a certain probability. Due to the limitation of the number and sort of variables, if the generator chooses to return a variable, it may encounter a situation where the required leaf node cannot be constructed. Therefore, FuzzBtor2 does not guarantee that the Btor2 file can be successfully generated, and some parameters would cause the construction to fail. The overall process of constructing a syntax tree is described in Algorithm 1.

4 Experimental Evaluation

Tested Tools. In order to evaluate whether FuzzBtor2 is practical, we choose two state-of-the-art word-level model checkers AVR  [17] and Pono  [19] as tested tools. Both checkers can take Btor2 as direct input format, and won the first and third place respectively in the 2020 Hardware Model Checking Competition  [2].

Table 1. Overall results.
Table 2. Classification and statistics of error messages. The first type of error message of Pono has been confirmed by its developers.

Experimental Setups. We run FuzzBtor2 repeatedly with different parameters to generate a total of 200 test cases, in which 100 cases are array-free, i.e., without array variables (BV), and 100 cases include array variables (ABV). The command of FuzzBtor2 used for the former purpose is fuzzbtor2 --seed i --max-depth 4 --constraints 1 --bv-states 3 --arr-states 0 --max-inputs 3 --candidate-sizes 1..8. To generate Btor2 models with array variables, the command is fuzzbtor2 --seed i --max-depth 4 --constraints 1 --bv-states 2 --arr-states 1 --max-inputs 3 --candidate-sizes 1..8. And i takes the value from 0 to 99. For every tested checker, the timeout to solve each instance is set to one hour.

Correctness. We use catbtor provided by btor2toolsFootnote 1 [21] to verify the correctness of outputs of FuzzBtor2. All Btor2 files generated by FuzzBtor2 pass the check of catbtor, which means all Btor2 models generated by FuzzBtor2 are legal in syntax. Moreover, neither of the two tested tools (AVR or Pono) returns error messages that are relevant to the syntax issue of input Btor2 files.

Results. We perform 200 calls to FuzzBtor2 and we get 100 BV test cases and 98 ABV test cases. Two calls for ABV test cases fail due to the situation discussed in sec. 3.2. The file sizes of the generated test cases are not large, with a maximum of 58 lines, a minimum of 22 lines, and an average of 39.2 lines. We use the generated 198 test cases to find bugs of AVR and Pono. All solving processes return results immediately, regardless of success or failure, except a situation where AVR timeouts on an ABV case. Table 1 presents overall statistical results. Neither AVR or Pono performs very well, since most of the test cases (157 vs. 127) trigger their bugs. And Table 2 presents the classification and statistics of error messages returned by tested tools. We encounter 12 and 6 different types of error messages for AVR and Pono respectively. It can be seen from Table 2 that ABV test cases trigger more types of errors than BV, which matches the fact that more code is covered in the process of solving a case in more complex theory. Considering both two tables, AVR performs worse than Pono in the experiments, where AVR solves fewer test cases and returns more types of error messages. Besides, the case where AVR timeouts is solved (Safe) by Pono, and is a Btor2 file with only 43 lines, so we speculate that a performance issue occurs in AVR.

5 Conclusion

We have presented FuzzBtor2, an open-source tool for the generation of random Btor2 files, by which the generated test cases can trigger various errors of state-of-the-art word-level model checkers. Several future works are being considered. First, if easy-to-trigger bugs of the tested tools are fixed, we could generate Btor2 files of larger size and filter out benchmarks that can be used for performance evaluation through experiments. Second, there are some keywords (output, fair, and justice) of Btor2 that are not supported by current FuzzBtor2, and we can extend the functionality of FuzzBtor2 to support them in future versions. Finally, as stated in sec. 3.2, the set of syntax trees constructed by the generator of FuzzBtor2 is essentially a model checking problem, independent of Btor2 format. Therefore, it would be useful to print model checking problems randomly generated in other formats such as Smv  [8] and Vmt  [12].