Gazer-Theta: LLVM-based Verifier Portfolio with BMC/CEGAR (Competition Contribution)

Gazer-Theta is a software model checking toolchain including various analyses for state reachability. The frontend, namely Gazer, supports C programs through an LLVM-based transformation and optimization pipeline. Gazer includes an integrated bounded model checker (BMC) and can also employ the Theta backend, a generic verification framework based on abstraction-refinement (CEGAR). On SV-COMP 2021, a portfolio of BMC, explicit-value analysis, and predicate abstraction is applied sequentially in this order.


Verification Approach and Software Architecture
Gazer-Theta is a software model checking toolchain with two main components: Gazer, an LLVM-based frontend and Theta, a generic model checking framework. An overview of the architecture and the verification approach can be seen in Figure 1.  Gazer. Gazer [7] is a verification frontend for C programs written in C++17, using the LLVM compiler infrastructure. 3 The input is a C program (possibly consisting of multiple source files) that is first translated to the LLVM IR (intermediate representation) using the clang compiler. Next, various built-in and custom LLVM passes are executed to perform optimizations (e.g., inlining, constant propagation, assertion lifting) and transformations (e.g., adding traceability information) on the IR. The LLVM IR is then transformed into different variants of control flow automata (CFA), depending on the backend to be used.
Gazer includes a built-in variant [5,7] of bounded model checking [2], relying on the z3 SMT solver [6]. The other supported backend is Theta (to be presented below). Currently, both backends provide analysis for reachability properties.
In the final step, the "raw" results of the backends are processed to produce a verdict (safe, unsafe, unknown) and a witness. Currently, Gazer only supports violation witnesses, both in a user-friendly syntax and in the format of SV-COMP. Furthermore, Gazer is also capable of generating executable test harnesses that can be used, e.g., in a debugger to reach the property violation.
Theta. Theta [8] is a generic and modular model checking framework written in Java 11, providing abstraction-and CEGAR-based analyses [4] for various formalisms, including CFA. Theta is highly configurable, supporting different abstract domains (such as explicit-value analysis [1] or predicate abstraction [3]) and refinement strategies, mostly based on interpolation (using SMT solvers such as z3 [6]). In the explicit-value analysis, only a subset of program variables is tracked, while predicate abstraction keeps track of logical facts and relationships instead of concrete values.
Verification portfolio. Based on our preliminary experiments, at SV-COMP 2021, we apply a sequential portfolio consisting of 3 steps, as illustrated by Figure 2. The portfolio is implemented as a Python script, which calls the tools described previously. First, bounded model checking is performed with a 150s time limit, which -in our experience -can already solve many unsafe instances. If BMC is inconclusive, we move on to an explicit-value analysis with a 100s limit, which can be effective for simpler, mostly deterministic programs. Finally, if the result is still unknown, we move on to the more heavyweight method of predicate abstraction. If any of the phases reports an unsafe result, as an additional step, we generate an executable test harness from the counterexample and check if the program actually reaches the property violation. This allows us to filter out some false positives (by reporting unknown instead of unsafe).

Strengths and Weaknesses
Gazer-Theta currently targets reachability analysis so we participate in the ReachSafety category, excluding subcategories Arrays, Heap and Sequentialized, due to features with limited support (e.g., pointers). The strength of the tool is  [7]. Furthermore, we also observed that executable harnesses could rule out many (142) false positives. The weakness of Gazer-Theta is its limited support for certain features, such as arrays, bit-precise reasoning (only available for BMC), and pointers. We also observed that the LLVM IR representation often results in large CFA (e.g., many temporary variables due to SSA form), which makes reasoning harder via CEGAR (as witnessed, e.g., by the ECA subcategory). Currently, the tool gives empty correctness witnesses only meeting syntactical requirements, but surprisingly most of them were accepted. Furthermore, our violation witnesses are quite "sparse" due to heavy usage of optimization passes, but some validators can still prove their correctness. The 13 false positive results are caused by unsupported library functions (related to floats) treated as external calls with undefined (arbitrary) behavior.

Tool Setup and Configuration
The competition contribution is based on Gazer v1.2.1 4 and Theta v2.5.0. 5 Additionally, the BMC backend of Gazer uses z3 version 4.8.6, while Theta is based on z3 version 4.5.0. The projects' repositories contain instructions on building the tools, but an archive can be found on Zenodo 6 with pre-built binaries for Ubuntu 18.04 or 20.04. The toolchain requires packages clang-9, libgomp1, llvm-9, openjdk-11-jre-headless and python3 to be installed. The entry point of the toolchain is scripts/gazer starter.py, which takes the verification task (C program) as its only mandatory input and runs the portfolio. No other parameters or configuration is required. Optionally, the output directory can be set (--output) and the version can be queried (--version). Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Software Project
The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.