figure a
figure b

1 Verification Approach

Runtime monitoring (RM) [1] is a lightweight approach to observing the executions of software systems and analyzing their behavior. The system, for simplicity take a single program, is executed and observed to obtain a trace of events. The observed events carry information about (a subset of) actions that have been performed by the program like accesses to memory, calls of functions, or writing a text to the standard output. The trace is analyzed by the monitor that outputs verdicts, be it verdicts about some correctness property of the program or, e.g., information about resource consumption. Runtime enforcement [12] goes a step further and allows the monitor to alter the behavior of the program upon seeing some event or detecting a certain (usually faulty) behavior of the program.

RM is traditionally applied as a complementary method to static analysis to find bugs in computer programs. In Bubaak, we use RM to do monitoring and enforcement of the verifiers instead of the analyzed program itself. The verifiers are manually modified to emit events about their internal actions, for example, that they have reached some part of the analyzed code or that they have discovered an invariant. The monitor gathers and analyzes these events and can decide to command a verifier to stop a search of some parts of a program or to take into account an invariant found by another verifier.

2 Bubaak at SV-COMP  2023

At SV-COMP  2023  [2], the verifiers that we used are based on forward and backward symbolic execution.

(Forward) symbolic execution (SE) [14] is well-known for being efficient in searching for bugs. It aims to explore every feasible execution path of the analyzed program by building the so-called symbolic execution tree. Such an approach must fail if the SE tree is infinite or very large, in which case we talk about the path explosion problem. There are ways how to prune the SE tree from paths that are known to exclude buggy behavior, e.g., using interpolation [13].

Backward symbolic execution (BSE) [11] is a form of SE that searches the program backwards from error locations towards the initial locations. It has been shown [11] that BSE is equivalent to k-induction [16], another popular but incomplete verification technique. The incompleteness of BSE (k-induction) is caused by the lack of information about reachable states. This deficiency can be tackled by providing (often trivial) invariants that supplement the missing information [5]. These invariants can be computed externally before running BSE, or they can be computed on the fly [4, 5, 11]. One of the on-the-fly methods is loop folding and the resulting technique is called BSELF  [11].

SE and BSE(LF) are well suited for analyzing safety properties, but are not suited for analyzing the termination of programs. To analyse this property, we have developed a new algorithm that has not been published yet and that we dubbed TIIP: termination with inductive invariants with progress. This algorithm runs SE, searching for non-terminating executions by remembering and comparing program states visited at loop headers. At the same time, it tries to incrementally (using a procedure similar to loop folding) compute an inductive invariant with progress for each visited loop. This invariant, if found, gives a pre-condition for the loop termination.

At SV-COMP  2023, we run in parallel two SE instances and one BSELF instance when checking properties unreach-call and no-overflow, SE and TIIP when checking termination, and just SE for memory safety properties. Using multiple SE instances at the same time makes sense because we use different verifiers (see Section 3) and their SE implementations support different features.

Because all the algorithms that we use are based on symbolic execution, the enforcement done by the monitor would effectively do a pruning of SE and BSE trees. Unfortunately, we have not managed to sufficiently debug this pruning and therefore it was disabled in the competition. As a result, Bubaak at SV-COMP  2023 only runs analyses in parallel without any coordination.

3 Software Architecture

Fig. 1.
figure 1

The setup of Bubaak at SV-COMP  2023. The colors indicate the properties that were checked by the different tools and algorithms.

The high-level scheme of Bubaak for SV-COMP  2023 is shown in Figure 1. Bubaak takes as input C files and the property file. Internally, it compiles and links the input files into a single llvm bitcode file [7] which is also instrumented using UBSan sanitizer [18] if the checked property is no-overflow. Then, verifiers are spawned according to the given property. All verifiers run in parallel (when there is more of them). At SV-COMP  2023, we used Slowbeast for SE, BSELF, and TIIP, and BubaaK-LEE as another instance of SEFootnote 1.

Slowbeast  [17] is a symbolic executor written in Python. It supports checking properties unreach-call and no-verflow with SE, BSE, and BSELF, and termination with TIIP. The tool has no or only a very limited support for properties no-data-race, valid-memsafety, and valid-memcleanup.

BubaaK-LEE is a fork of symbolic executor Klee  [9] which is implemented in C++ and the current version is a merge of the upstream Klee and JetKLEE (the fork of Klee used in the tool Symbiotic  [10]) with additional modifications. These modifications mostly concern modeling standard C functions but include also partial support for 128-bit wide integers and support for global variables with external linkage. BubaaK-LEE implements SE without any SE tree pruning and can check for all SV-COMP properties except for no-data-race.

Both symbolic executors use Z3 [15] as the SMT solver. The features they support differ significantly, though. For example, Slowbeast supports, apart from BSE(LF) and TIIP, symbolic floating-point computations, threaded programs, and incremental solving, while it does not support symbolic pointers and addresses which are features supported by BubaaK-LEE.

The monitor is currently a part of the control scripts written in Python and at SV-COMP  2023 it monitors only the standard (error) output of the tools as monitoring anything else is redundant until the implementation of data exchange between verifiers and the monitor is finished. The only enforcement that it does at SV-COMP  2023 is terminating the analysis entirely.

Differences to Symbiotic The tool Symbiotic  [10] also uses Slowbeast and a fork of Klee, and therefore a discussion on differences between Bubaak and Symbiotic is in place. The version of Slowbeast used in Symbiotic is outdated while Bubaak uses the most up-to-date version (at the time of writing the paper) where a substantial part of the code has been rewritten and that contains new features including the implementation of TIIP. The relation between BubaaK-LEE and JetKLEE is mentioned earlier in this section.

Other differences between Bubaak and Symbiotic exist: Bubaak does not use any pre-analyses, slicing, and instrumentation (apart from the instrumentation by UBSan for the property no-overflow, but there Symbiotic uses its own instrumentation), and it runs the verifiers in parallel, while Symbiotic uses a sequential composition [10].

4 Strengths and Weaknesses

The combination of SE and BSELF has been previously shown to be promising [11] because SE can quickly analyse many programs and BSELF then solves hard safe instances were SE found no bug or was unable to enumerate all paths. Running TIIP in parallel with pure SE has similar advantages. Still, all of SE, BSELF, and TIIP can be computationally very demanding as the number of executions they must search may be enormous and/or their exploration may involve lots of non-trivial queries to the SMT solver.

Running multiple verifiers in parallel reduces the wall-time while eating CPU time rapidly, which may be a disadvantage in SV-COMP. A remedy for this should be finishing the data exchange support between verifiers, which will allow to avoid burning CPU time on duplicate tasks.

5 Results of Bubaak at SV-COMP  2023

Table 1. Number of benchmarks decided by individual verifiers per property.

The results of Bubaak were highly influenced by bugs in the implementation. The tool had 41 wrong answers, 31 of these caused by a mistake in parsing of the output of BubaaK-LEE (25 for the property valid-memcleanup and 6 for the property termination). The rest of wrong answers (10) were caused by miscellaneous bugs. After normalizing scores, these 41 wrong answers resulted in loosing almost 10000 points in the overall score.

Also, BSELF did not decide a single benchmark because of a mistake in command line arguments when invoking it. Therefore, running Slowbeast was useful mainly in the category Termination where TIIP was able to solve roughly half of the decided benchmarks (in the rest of cases, BubaaK-LEE successfully enumerated all execution paths). The numbers of decided benchmarks are summarized in Table 1.

Overall, Bubaak won the category Falsification-Overall which confirms that SE is very good in finding bugs. The tool also scored silver in the category SoftwareSystems where it was also the leading tool in several sub-categories.