Witness Validation (Competition Contribution)

. The validation of violation witnesses is an important step during software veriﬁcation. It hides false alarms raised by veriﬁers from engineers, which in turn helps them concentrate on critical issues and improves the veriﬁcation experience. Until the 2021 edition of the Competition on Software Veriﬁcation ( SV-COMP ), CPAchecker was the only witness validator for the ConcurrencySafety category. This article describes how we extended the Dartagnan veriﬁer to support the validation of violation witnesses. The results of the 2022 edition of the competition show that, for witnesses generated by diﬀerent veriﬁers, Dartagnan succeeds in the validation of witnesses where CPAchecker does not. Our extension thus improves the validation possibilities for the overall competition. We discuss Dartagnan ’s strengths and weaknesses as a validation tool and describe possible ways to improve it in the future.


Introduction
Most software verification tools report witnesses to property violations. Since SV-COMP 2015, there is a common format in which witnesses are represented by automata [4]. Each edge of such an automaton is annotated with data that can be used to match program executions. A data annotation can be, e.g., "assumption" specifying constraints on values of variables in a given state, "control" specifying the outcome of a branch condition, or "startline" specifying a concrete line in the source code. More details about data annotations and their semantics can be found in the exchange format documentation [1].
A witness validator checks that a violation can be reproduced using the information provided by the witness. Automata-based verifiers can easily be converted into validators by analyzing the synchronized product of the program with the witness automaton. In this setting the witness automaton guides the verifier. If none of the outgoing edges on the program state match the next edge of the witness automaton, then the verifier cannot explore the current path further. If the edge on the program state matches, then the witness automaton and the program proceed to the next state, eventually leading to a violation.
While this idea allows one to easily convert any automata-based verifier into a validator, not all verifiers are automata-based.
Dartagnan is an SMT-based verifier. In the next section, we explain how to convert it into a validator. The idea is to extract information from the witness and use it to reduce the search space explored by the backend SMT solver.

Validation Approach
Given a concurrent program and a specification in the form of assertions, Dartagnan generates an SMT formula ϕ Ver = ϕ Cf ∧ ϕ Df ∧ ϕ Sc ∧ ϕ which is satisfiable if and only if some assertion fails [16,15]. The formulas ϕ Cf and ϕ Df encode (respectively) the control flow and the data flow of the program. Formula ϕ Sc encodes scheduling constraints. Finally, ϕ expresses that at least one assertion must fail. If the formula is satisfiable, then a violation exists. The goal of Dartagnan (as a verifier) is to find such a violation. This amounts to finding an appropriate scheduling among the threads. Such a scheduling is encoded as a happens-before relation between the instructions. Dartagnan thus searches the space of all viable happens-before relations to find a violation or prove that none exists.
We now explain how to extend Dartagnan into a violation witness validator. The idea is to extract from the violation witness a formula ϕ that we conjoin to the rest of Dartagnan's encoding, resulting in ϕ Val = ϕ Ver ∧ ϕ . The extra constraints in ϕ reduce the search space for the SMT solver. For the verification of concurrent programs taking inputs from the environment, there are two sources of non-determinism: the data coming from the input (which might influence the control flow) and the scheduling. The purpose of ϕ is to reduce this non-determinism. Extending the SMT encoding as described in ϕ Val is conceptually easy. The interesting question is "what information from the witness shall we use?" The less information we use, the more we move from pure validation to full verification.
While automata-based validators can use some information in a straightforward manner, this is not the case for Dartagnan.

A violation witness can contain cycles to represent infinitely many execu-
tions. However, SMT-based tools unroll cycles and perform bounded verification, thus only part of this information is helpful. 2. Since Dartagnan (as many other BMC tools) does not keep an explicit notion of state, using state information is not trivial.
The exchange format for violation witnesses allows for expressing information about state assumptions, the control flow, and the scheduling. We abstract out from the former two and only use scheduling information. We assume that witness automata represent a single path and that the edges contain "startline" data corresponding to read or write instructions 1 . Those are the only instructions that can affect our happens-before relation. While we do not explicitly encode the outcome of control-flow instructions, certain control-flow information is implicitly encoded based on which instructions are executed. We explain the reason behind these design decisions and assumptions, discuss its limitations, and describe how we plan to improve this in the future in Section 3. Despite these limitations, and as we show in Section 4, our validator performs well in practice.
Let (S, E) be a witness automaton with states S and edges E. For each e ∈ E, function e2i(e) returns the set of read or write instructions coming from the "startline" in the C file that corresponds to the given edge. Since witnesses represent single paths, they can be seen as a word over S. Let w ∈ S * be a witness, we define the witness-to-formula function which constructs ϕ as happens-before(i 1 , i 2 ) if w = s · w

Strengths and Weaknesses
The main strengths of our validation approach are simplicity and modularity. The approach just requires to add a new sub-formula to the SMT encoding used for verification. The validator is modular in the sense that using more or different information from the witness does not change the validation approach. For example, adding information from the witness about the control flow just requires adding more constraints to ϕ . Our validation approach assumes that witness automata represent single paths. This is a limitation not imposed by the exchange format. However, verifiers tend to stop as soon as they find one violation and thus generate witnesses representing a single violation path. A second limitation is that we do not explicitly consider control-flow information. This might impact the performance of the validation since not all non-determinism is removed and the search space might still be large. Converting such control-flow information into SMT is simple in principle. However, since Dartagnan internally converts the C program into Boogie [14], matching conditionals with the corresponding assembly-like jumps requires some work. A second consequence of not extracting control-flow information from the witness is that we might validate witnesses that do not lead to a violation. This is because we over-approximate the paths of the program represented by the witness and thus our approximation might include the path leading to the violation even if the witness did not.

Validation Results
We inspected the results of SV-COMP 2022 [5]  From the 20 verifiers in ConcurrencySafety, we selected five tools implementing different verification approaches. We consider them good representatives of the whole category: (i) CBMC [12] (used as a backend by Deagle [8] and Lazy-CSeq [10]), (ii) CPAchecker [6] (used as a backend by CPA-Lockator [3] and Graves [13]), (iii) EBF [2] (combines BMC with fuzzing, a very effective technique to find bugs), (iv) Dartagnan [16] (only tool where the memory model, here sequential consistency, is taken as an input), and (v) GemCutter [11] (shares the codebase with UTaipan [7] and UAutomizer [9]). Table 1 presents the results of the validation in SV-COMP 2022. We report the number of witnesses generated by each verifier ("Witnesses"). For each of the validators (columns "Dartagnan" and "CPAchecker"), we report the number of cases where the validation conclusively finished (i.e., it returned True or False), whether the violation was confirmed (left of "/") or not (right of "/"), and the number of correct validations by one tool where the other did not report a result (columns "Dart \ CPA" and "CPA \ Dart", respectively). For the SMT-based verifiers CBMC and EBF, Dartagnan has 63.28% resp. 75.52% success rate in the validation (against 31.15% resp. 19.66% success rate for CPAchecker). Unfortunately, it did not validate any of the witnesses generated by CPAchecker. This was due to a bug in the witness parser that has been identified and fixed after the competition. CPAchecker validated all the witnesses that it generated as a verifier. Dartagnan validated 89.74% of its own witnesses while CPAchecker only validated 12.82%. For GemCutter, the validation success of Dartagnan is only 6.02%. This is because, due to another bug, it wrongly marked 237 witnesses as not validated. The fixed version of Dartagnan is able to validate all such cases. Despite this, from the 18 witnesses that Dartagnan validated, 15 of them were not validated by CPAchecker, thus improving the validation possibilities for the overall competition.

Software Project and Configuration
The project home page is https://github.com/hernanponcedeleon/Dat3M. To run Dartagnan as a validator, use the following command: $ Dartagnan-SVCOMP.sh -witness <witness> <property> <program> Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made. The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.