1 Introduction

Over the past years, automatic software validation (i.e., verification and testing) has become a mature field, with numerous tools providing various sorts of analyses (see, e.g., the annual competitions on software verification and testing [8, 9]). Still, the one-fits-all approach to software validation has not yet been found. All tools have their specific strengths and weaknesses, and tools efficiently solving one sort of analysis tasks might be slow at or even unable to solve other tasks.

To remedy this situation, cooperative software verification aims at having different tools cooperate on the task of software verification. This principle can not only be applied to verification but also to testing, and several different approaches combining various sorts of analyses exist today (e.g., [2, 5, 15, 16, 24, 27, 33, 34, 38, 39, 42, 48, 53,54,55, 75]). To achieve cooperation, tools need to exchange information gathered about a program during its analysis. To leverage the strengths of tools, we need to make sure that no results computed about a program are lost during this information exchange. To this end, existing cooperative approaches use various sorts of so-called verification artifacts [26] for information exchange, e.g., correctness witnesses [10], predicate maps [24] or violation witnesses [12]. The artifacts are, however, often specialized to the type of analysis performed, with the consequence of having to define a new form of artifact with every new cooperation.

Fig. 1
figure 1

Cooperation of over- and under-approximating analyses

In this work, we introduce a novel uniform verification artifact (called GIA) for the exchange of information, specifically focusing on the cooperation of over- and under-approximating software analyses (see Fig. 1), as many existing combinations successfully make use of these two types of analyses (e.g., [1, 2, 5, 20, 21, 24, 30, 33,34,35, 41, 42, 49, 51, 59, 70, 77, 78]). Over-approximating (OA) analyses build an over-approximation of the state space of a program, while under-approximating (UA) analyses inspect specific program paths. An UA analysis typically aims at finding errors; an OA analysis aims at proving program correctness.

Before defining the GIA—our new type of verification artifact—we first of all studied existing combinations of (cooperative and non-cooperative) analyses and the information they assemble and possibly exchange during an analysis. We also investigated what input formats existing tools accept. The majority of tools just take a program as input, however, there are also some tools already allowing for verification artifacts as additional inputs. With these insights at hand, we defined a new verification artifact in the form of a generalized information exchange automaton (GIA) which can express information generated by over- and under-approximating analyses in the context of software validation. More specifically, our artifact can encode information on (1) program paths which definitely or potentially lead to an error, i.e., (potential) counterexamples, (2) program paths which are already known to be safe, (3) program paths which are already known to be infeasible plus (4) additional constraints on program paths like state invariants. The unification of all such information in one verification artifact should in particular make the artifact independent of its usage, i.e., the semantics of the GIA should be the same in all usage contexts within software validation. Current artifacts, in particular the protocol automata of Beyer and Wehrheim [26], have differing meanings depending on their usage: sometimes the paths described by an automaton are the safe paths, and sometimes the paths leading to a property violation. By introducing the idea of target nodes and inspired by three-valued logic, we can define the semantics of the verification artifact GIA in such a way that it can encode these different information exchanged in software validation while maintaining a uniform semantics.

Along with this new artifact, we also introduce two operations on it: reducers [18, 41] and combiners. A reducer allows to (syntactically) reduce a program to the part which a (prior) analysis has not yet completed (e.g., not yet proven safe). Reducers are required for cooperation of analysis tools which only take programs as inputs. A combiner allows combining computed analysis results given in two GIAs into one. We formally show that connecting tools via reducers and combiners guarantees computed analysis results to never be lost.

To demonstrate the feasibility of our approach and to show that GIAs are in fact usable in different scenarios, we have implemented two such cooperations employing GIAs as an exchange format. We have experimentally evaluated these cooperations on benchmarks of SV-Comp [9] and report on the outcomes, in particular how existing drawbacks in cooperation approaches caused by information loss can be overcome with this new artifact. Moreover, we observe that encoding information on reachable and unreachable program paths within the same artifact allows cooperative approaches to compute final results faster.

This article is an extended version of our conference paper [56], extending it with (1) a thorough discussion of related work, especially on existing artifacts and their shortcomings, (2) proofs of theorems, (3) an implementation and evaluation of an additional use case for GIAs which is cooperative test case generation and (4) a more detailed explanation of the application of GIAs in other use-cases.

2 Background

We generally aim at the validation of programs written in C. To be able to discuss and define formats for the information exchange, especially their semantics, we first provide some basic definitions on the syntax and semantics of programs, and then survey existing artifacts.

2.1 Program syntax and semantics

We represent a program as a control-flow automaton (CFA). Intuitively, a CFA is a control-flow graph, where each edge is labeled with a program statement. More formally, a CFA \(C \) is a graph \(C =(Loc,\ell _0, G)\) with a set of program locations Loc, the initial location \(\ell _0 \in Loc\) and a transition relation \(G \subseteq Loc \times Ops \times Loc\), where Ops contains all possible operations on integer variables,Footnote 1 namely assignments, conditions (for both loops and branches), function calls and return statements. We let \({\mathcal {C}}\) denote the set of all CFAs. Note that any program can be transferred into a CFA and any deterministic CFA into a program.

We assume the existence of two specific functions error and random which programs can call; the former can be used to represent violations of a specification (reachability of an error), the latter returns a non-deterministic value and is typically used to model inputs. We assume our programs to be deterministic except for this function random.

Fig. 2
figure 2

An example program P for test case generation

For defining the semantics of CFAs, we let \( Var \) denote the set of all integer variables present in the program, \( AExpr \) the set of arithmetic and \( BExpr \) the set of Boolean expressions over the variables in \( Var \). A state \(c \) is a mapping of the program variables to integers, i.e., \(c: Var \!\rightarrow \! {\mathbb {Z}}\). We lift this mapping to also contain evaluations of the arithmetic and Boolean expressions, such that \(c \) maps \( AExpr \) to \({\mathbb {Z}}\) and \( BExpr \) to \({\mathbb {B}}=\{0,1\}\). A finite syntactic program path is a sequence s.t. \((\ell _i,g_{i+1}, \ell _{i+1}) \in G\) for each transition. We extend a syntactic path to a semantic program path , by adding states to each location, where \(c _0\) assigns the value 0 to all variables, and state changes for are defined as follows: If \(g_{i+1}\) is an assignment of the form \(x\! = \!a\), \( x\! \in \! Var , a\! \in \! AExpr \), \(c_{i+1} = c_i[x \mapsto c_i(a)]\), for assignments \(x=\)random() \(c_{i+1} = c_i[x \mapsto z]\), \(z \!\in \! {\mathbb {Z}}\), otherwise \(c_{i+1} = c_i\).

Note that we do not require that a semantic path meets all its Boolean conditions, as we want to distinguish between feasible and infeasible semantic paths: A semantic path is called feasible, if for each condition \(g_{i+1}=b\) on the path \(c_i(b)= true\) holds, otherwise it is called infeasible. We say that a path \(\pi \) reaches location \(\ell \in Loc\) if \(\ell = \ell _n\). If no feasible semantic path reaches a location \(\ell \in Loc\), it is called unreachable. The set of all semantic paths (or in short, paths) of a CFA C is denoted by \({\mathcal {P}}(C)\).

Fig. 3
figure 3

CFA for program in Fig. 2, where nodes after branching points are marked gray

Figure 2 contains a C-program and Fig. 3 its corresponding CFA. Let us assume our validation task on this program is test case generation, more specifically generating test inputs (values returned by +random+) which cover all branches of the program. A tool would then need to generate inputs leading to paths such that each node of the CFA marked in gray is reached by at least one path. A feasible path that reaches the location \(\ell _3\) is: . The location \(\ell _7\) is unreachable, as +x+ is always greater than 5 at \(\ell _6\), and thus the branch cannot be covered.

3 Related work

Combining different analyses is commonly applied to enhance the performance in verification or test case generation. In verification, the goal is to check whether a program adheres to certain specifications, in test case generation tools aim at finding test cases covering a set of test goals. Following [26], these combinations of analyses can be divided into four categories: Portfolios, selection-based approaches, cooperations, and conceptual integrations. Portfolio-based approaches [7, 43, 50, 58, 61, 62, 83, 85] run multiple components sequentially or in parallel and select the first computed result, whereas selection-based approaches [6, 40, 44, 47, 68, 78, 84] select one verification component upfront based on the task. Both concepts do not foresee an information exchange between different components; hence, we do not consider them further.

In this work, we aim for finding a unified artifact that is applicable for many existing concepts of cooperative software validation combining over-approximating(OA) and under-approximating(UA) components. We present different cooperation-based approaches and approaches using a conceptual integration next. Many tools combine OA and UA tools, but there are some ideas that either combine only OA or only UA tools.

3.1 Conceptual integration

A conceptual integration is a white-box combination of multiple components, where the components exchange information not via clearly defined artifacts but rather using internal formats, method calls or accessing shared data structures.

Sequential Combinations. In sequential combinations, tools are executed in a sequential manner, where the information computed by the former tool is given to the next. Different approaches combine an OA verification tool with an UA testing approach (like dynamic symbolic execution [48] or robustness testing [63]) to guide the testing tool or analyze the non-verified program parts. FuSeBMC [3] combines different UA components for test case generation: In the first phase, a fuzzing and a bounded model checking tool are run in parallel trying to cover all test goals. Afterward, the covered goals and the inputs for covering them are given to a selective fuzzer, that uses the given information to cover the remaining test goals in the second phase.

Interleaved Combinations. Interleaved combinations [2, 4, 5, 46, 49, 51, 60, 64, 73, 76, 77, 81, 86] can be seen as an extension of the sequential combinations, where each component may be called multiple times. In Smash [49], an OA predicate analysis is combined with dynamic test generation (UA), wherein both tools compute information in an alternating way. The Smash algorithm maintains two sets of function summaries in the form of predicates and implications, computed by the OA and UA analyses. One set contains witnesses for concrete execution paths within the function, whereas the other summaries express certain properties (postconditions) that hold for all executions of the function satisfying certain preconditions. Synergy [51] (with its implementation in the tool Yogi [77]) and Dash [5] share the idea of combining predicate analysis with a testing approach. Both maintain two separate data structures, an over-approximation of the state space and a tree of concrete program executions. The core idea is to steer testing along potential counterexamples and use information obtained by testing to guide the refinement process. The Ufo algorithm follows a similar idea but stores all information within a single abstract reachability graph (ARG) [2].

The idea of concolic testing is to enrich a testing tool with concrete test inputs that may lead to unexplored parts of the program [28, 32, 71,72,73,74, 79, 80, 82]. The concrete inputs are computed using an over-approximating symbolic execution. Daca et al. [42] use a concolic execution engine in combination with predicate abstraction. The predicate abstraction guides the search of the concolic tester by identifying unreachable program parts. Beneath this information, the concolic tester communicates the test goals already covered. Information is exchanged using an ARG.

Summary. Approaches using conceptual integration may exchange information on concrete program executions and the resulting goals covered, or the unreachability or safety of certain parts (under some Boolean conditions) of the program. Several approaches use an ARG for information exchange.

3.2 Cooperative approaches

In contrast to conceptual integration, cooperative approaches use components as black boxes and information is exchanged only using clearly defined verification artifacts.

Sequential Combinations.

One of the earliest ideas for cooperative software validation in a cooperative manner is cooperative manner is conditional model checking (CMC) [15]. Therein, so-called conditional model checkers (OA tools) are executed sequentially, where each generates a predicate specifying under which condition the program adheres to the specification. The conditions are exchanged via a condition automaton. In [41], the second model checker is replaced using a testing tool, yielding a combination of an OA and an UA tool. The information from the condition automaton is transformed into a reduced program to be able to use arbitrary testing approaches. The general construction of conditional verifiers using reduction is presented in [18] and different reduction and folding strategies are proposed in [17]. In [33] and [34], an OA verification tool analyzes the program, adding conditions under which the program is safe directly into the code. These conditions are then either further analyzed, in [33] or tested dynamic symbolic by execution engine (UA) in [34].

In CoVEGI [55], a OA analysis tool is cooperating with an invariant generation tool. The invariant generation tool computes on-demand invariants for the analysis tool to enhance its performance. The invariants are encoded within correctness witnesses.

Interleaved Combinations. CoVeriTest [16, 65, 66] generalizes the idea of [42] by combining arbitrary verifiers (OA) for test case generation. Each verifier tries to reduce the set of open test goals and generates a condition describing the explored state space, such that other tools can safely ignore it. The condition is then used for cooperation. A similar approach only employing testing tools is presented in [7].

Counterexample-guided abstraction refinement (CEGAR) [35] is a technique for iteratively refining the abstraction. The idea is implemented in many tools [1, 20, 21, 30, 59, 70, 78], where potential counterexamples, spurious counterexamples and precision increments (mostly in form of new predicates) are exchanged between the components. A decomposed and cooperative formalization is presented in [24], where standardized formats, namely correctness and violation witnesses, are used for exchanging the information. The concept of property-directed k-induction [52, 69] is formalized in a cooperative way in [27], where the information exchanged of generated invariants and traces is realized using no standardized format.

Summary. In cooperative software validation, components exchange information on programs partially verified (or simply explored) under certain conditions, (helpful) invariants, potential and spurious counterexamples, and newly discovered predicates. The information is encoded using the standardized formats of condition automaton, correctness, and violation witnesses.

4 Existing artifacts

As seen in the related work section, there are different artifacts that are already used either for cooperative validation, for witness validation or storage of correctness proofs [10, 12, 18, 26, 42, 55, 67]. An overview of existing verification witnesses is given in [11]. In the following, we define the artifacts (1) protocol automaton, (2) violation and (3) correctness witness, (4) condition automaton and (5) abstract reachability graph formally. We discuss their suitability for representing information exchanged between OA and UA analysis and provide concrete examples.

All of the presented formats can encode information about (non-)violation of some reachability properties, i.e., the (non-)reachability of a set \( Prop \subseteq Loc\) of locations of the CFA.

4.1 Requirements

Before we discuss whether the artifacts used are suitable in the general setting, we summarize the requirements based on existing use cases for such a general format for exchanging information between OA- and UA analyses. Following existing cooperations, an artifact needs to be able to encode information on:

  1. (R1)

    program paths which are already known to be feasible (and may reach certain test goals or an error state),

  2. (R2)

    program paths which are either feasible and reach an error state or are infeasible (potential counterexample),

  3. (R3)

    program paths which are already known to be safe,

  4. (R4)

    program paths which are already known to be infeasible,

  5. (R5)

    additional constraints on program paths like state invariants,

  6. (R6)

    and additionally, an artifact needs to have a context-independent semantics.

4.2 Protocol automaton

The protocol automaton, first introduced in [12] and extended in [26], in general describes a set of semantic paths. It can be used to define different existing verification artifacts in a uniform way, as the semantics of the described paths is context-dependent.

Definition 1

A protocol automaton \(A_p\!=\!(Q,\!\Sigma ,\!\delta \!,q_0,\!F)\) for a program represented as CFA \(C\!=\!(Loc,\ell _0,G)\) is a non-deterministic automaton that consists of:

  • a finite set of states \(Q \subseteq \Omega \times BExpr \), each being a pair of a name out of some set \(\Omega \) and a state-invariant,

  • an alphabet \(\Sigma \subseteq 2^G \times BExpr \),

  • a transfer relation \(\delta \subseteq Q \times \Sigma \times Q\),

  • an initial state \(q_0 \in Q\), and

  • a set \(F \subseteq Q \) of final states.

Automaton states have (arbitrary) names and potentially invariants associated with them which come in the form of Boolean expressions over program variables. Transitions are labeled over the alphabet \(\Sigma \) with elements being sets of transitions of the CFA plus additional assumptions about program variables describing conditions when executing these transitions (see Def. 3 below). The connection between a semantic path \(\pi \) in the CFA C and paths that are described by \(A_p\) is established via matched paths. \(A_p\) matches a path if there is a sequence \(0 \!\le \! k\! \le \! n\), with , such that

  1. 1.

    \(\forall i, 1 \le i \le k: g_i \in G_i\),

  2. 2.

    \(\forall i, 1 \le i \le k: c_i \models \varphi _i\),

  3. 3.

    \(\forall i, 0 \le i \le k: c_i \models \psi _i\).

\(A_p\) covers \(\pi \), if \(A_p\) matches \(\pi \), \(k=n\) and \(q_k \in F\).

In Fig. 4 to 6, three protocol automata are shown (a violation witness, a correctness witness and a condition automaton, see below), each of them covers a set of paths from the CFA of Fig. 3.

To be able to represent different artifacts as protocol automata, a context-dependent semantics is used, meaning that the semantics is fixed per artifact instance. Thus, each tool working with protocol automata has to be aware of the type of protocol automaton given to it and its semantics. Depending on the encoded artifact, matched paths can, among others, encode paths leading to a property violation (in Fig. 4), or paths not reaching any nodes from \( Prop \) (in Fig. 5). Consequently, it is impossible to mark within one protocol automaton both, a path to a node from \( Prop \) as unreachable and state that another path reaches a different node from \( Prop \). Hence, (R6) and either (R1) or (R3) from Sec. 4.1 is not fulfilled.

Next, we discuss three artifacts which are specializations of protocol automata.

4.3 Violation witness

A violation witness [12] is used to encode a set of feasible semantic paths that lead to a property violation. It can be represented as protocol automaton \(A_{{VW}}=(Q,\Sigma , \delta ,q_0,F)\), where each state has only a trivial state invariant: \(\forall (q,\varphi ) \in Q: \varphi = true\). The assumptions in \(A_{{VW}}\) can contain constraints on the variable values. Semantically, paths covered by \(A_{{VW}}\) contain a property violation. An example for a violation witness represented as protocol automaton for the CFA from Fig. 3 with \( Prop =\{\ell _4\}\) is depicted in Fig. 4. The only path covered by \(A_{{VW}}\) is , where \(c_0= \{x \! \mapsto \! 0\}\) and \(c_1 = \{x \! \mapsto \! 1\}\). Hence, following \(\pi \) leads to a property violation in the example program P.

By design, the violation witness does not allow the use of state invariants. Thus, its semantics does neither allow to encode that a path does not reach a node from \( Prop \) (i.e., is safe) or is infeasible or some justification of this in the form of state invariants. Hence, (R3), (R4), and (R5) from Sec. 4.1 are not fulfilled.

4.4 Correctness witness

A correctness witness [10] is used to encode that a program is safe (no node from \( Prop \) is reachable). It can be represented as protocol automaton \(A_{{CW}}=(Q,\Sigma , \delta ,q_0,Q)\), where all states are final states and each edge is labeled with trivial assumptions: \(\forall (q,(G,\psi ),q') \in \delta : \psi = true\). States may contain a state invariant that justifies why the program is correct. Semantically, paths covered by \(A_{{CW}}\) do not contain a property violation. An example for a correctness witness represented as condition automaton for the CFA from Fig. 3 with \( Prop =\{\ell _8\}\) is depicted in Fig. 5, where \(*\) denotes any operation from Ops As \(\ell _8\) is unreachable, a correctness witness can be generated. In Fig. 5, the invariant \(x>5\) is associated with the state \(q_6\). As the condition \(c \models x>5\) holds for all states \(\langle c,\ell \rangle \) on a feasible path that is covered by \(A_{{CW}}\) and contains \(\langle c,\ell _6\rangle \), \(x>5\) is in fact a justification for the unreachability of \(\ell _8\).

Correctness witnesses do not allow to specify the reachability of nodes from \( Prop \) nor to encode partial results. Therefore, encoding paths to nodes from \( Prop \) as well as marking that only certain paths of the program (and not the whole program) are safe is impossible. Hence, (R1) and (R2) from Sec. 4.1 are not fulfilled.

4.5 Condition automaton

A condition automaton [18] states which semantic paths of the program are already successfully verified and under which condition. It can be represented as a protocol automaton \(A_{{CA}}=(Q,\Sigma , \delta ,q_0,F)\), where each state has only trivial state invariants (\(\forall (q,\varphi ) \in Q: \varphi = true\)) and accepting states cannot be left (\(\forall (q_f,\cdot ,q) \in \delta : q_f \in F \Rightarrow q \in F\)). Semantically, paths covered by \(A_{{CA}}\) do not contain a property violation. In contrast to a correctness witness \(A_{{CW}}\), a condition automaton can contain assumptions, allowing to specify the unreachability under that assumption. In Fig. 6, we depict a condition automaton \(A_{{CA}}\) for program P with \( Prop =\{\ell _4\}\), where \(*\) again denotes any operation from Ops. The partial result, e.g., generated by a simple reachability analysis, covers all paths containing \(\ell _5\) and marks them as safe (under the trivial assumption true). Note that \(A_{{CA}}\) correctly encodes the information that a part of the program satisfies the property, even though the program contains a property violation (c.f. Figure 4).

Although condition automata can mark certain regions as safe, paths (potentially) leading to a node from \( Prop \) cannot be encoded. In addition, condition automata do not allow adding state invariants. Hence, (R2) and (R5) from Sec. 4.1 are not fulfilled.

4.6 Abstract reachability graph

An abstract reachability graph [14] represents the abstract state space containing the analysis results computed as a graph. It is used within different tools, e.g., CPAchecker [21]. As the ARG can be generated by any analysis, not necessarily using predicates for the abstraction, it cannot be formalized as a protocol automaton. We define an ARG \(R=(N, succ, root, F, prec)\), with a set of abstract states N, a successor relation \(succ \subseteq N \times G \times N\), the initial node \(root\in N\), a set of frontier nodes \(F \subseteq N\) that need to be explored and a precision prec that describes the abstraction level of each state.

An example of an (intermediate) ARG generated by an interval analysis [37] and a location analysis [13] is depicted in Fig. 7. Each abstract state comprises a unique name, an interval for the variable x, and a location from the CFA. Frontier states are marked in gray, thus the abstract state \(q_6\) is not fully explored. As the node \(q_5\) is explored and has only a single successor \(q_6\), the ARG also contains the information that \(\ell _7\) is unreachable, as no abstract state contains \(\ell _7\).

In general, the ARG can be used to represent all desired information that should be exchanged. Due to the analysis-dependent information, ARG states generated by different analyses (e.g., by interval analysis, live variable analysis or predicate abstraction) may however have different shapes, which makes an exchange of ARGs between different analyses in a general setting impossible.

Fig. 4
figure 4

Violation Witness \(A_{{VW}}\) with \( Prop \!=\!\{\ell _4\}\) for P

Fig. 5
figure 5

Correctness Witness \(A_{{CW}}\) with \( Prop \!=\!\{\ell _8\}\) for program P

Fig. 6
figure 6

Condition Automaton \(A_{{CA}}\) with \( Prop \!=\!\{\ell _4\}\) for program P

Fig. 7
figure 7

ARG generated by an interval analysis with \( Prop \!=\!\{\ell _8\}\) for program P. Frontier states are marked in gray

Summary. In summary, none of the existing artifacts is able to encode all desired information and is usable independent of the employed tools while maintaining one semantics. Next, we introduce a new format that overcomes these limitations.

5 Validation artifact GIA

In this work, we focus on two different validation tasks on programs, verification, and test case generation, performed by over- and under-approximating analyses. For verification, the goal is to show the non-reachability of certain error locations. To this end, we fix a safety property \(S\!=\!(\ell ,\omega )\) as a pair of location \(\ell \in Loc\) and condition \(\omega \in BExpr \) which has to hold at \(\ell \). In practice, this is encoded in the CFA using two edges . Note that there can be multiple safety properties for a program. For test case generation, the goal is to find paths from \(\ell _0\) reaching all locations from a set \(L_{cover}\), containing, e.g., each branch or statement in the program (branch-, statement-coverage) or certain function calls, especially error. To specify these paths, a sequence of return values (called test suite) for the calls to random suffices (as +random+ models inputs to programs).

For cooperation, we prefer a uniform way of describing these tasks which we get by introducing the notion of target nodes, denoted by L, \(L \!\subseteq \!Loc\). A target node is a node that either has a single outgoing edge labeled error (for verification) or is in \(L_{cover}\) (for test case generation). We can now reformulate the two tasks: the goal of verification is to show that no target node is reachable, the goal of test case generation is to find a test suite such that all target nodes are reached. In Fig. 3, the target nodes for test case generation are \(L~=~\{\ell _3,\ell _5,\ell _7,\ell _9\}\).

Our overall objective is next to define an artifact with one semantics that is valid for most type of exchanged information. In general, UA (under-approximating) and OA (over-approximating) tools either aim at showing that target nodes are reachable (for example a call to +error+ or a branch that needs to be covered) or that (a part of) the program does not reach any target node (i.e., program is safe). The overall goal is achieved when for each target node either a path reaching it is found or it is proven unreachable.

Summarizing Sec. 4.1, the information exchanged between UA and OA tools thus needs to be about (1) feasible paths definitely leading to a target node (R1), (2) paths definitely not leading to a target node (either as they do not reach one or are infeasible, (R3) and (R4), and (3) candidate paths potentially leading to target nodes and hence interesting to consider for the analysis, but where the definite result about it is unknown so far (R2). The latter information is used in two cases: When an UA tool has not yet covered a path, either due to resource/time limitations or because it is infeasible, and when an OA tool has discovered a path to a target node, which might be feasible. In addition, we need the artifact to be able to pass helpful information about invariants of program locations or constraints about program transitions (R5). All information needs to be encoded while maintaining one fixed, context-independent semantics (R6).

So far, none of the existing artifacts discussed in Sec. 4 is able to encode all this information while maintaining one semantics for the automaton. Inspired by the idea of three-valued logics (e.g., for three-valued model checking [29]), we extend the condition automata of [18] by introducing three different, disjoint sets of accepting states, one for each type of exchanged information.

Definition 2

A generalized information exchange automaton for over- and under-approximative analysis (GIA) \(A\!=\!({\mathcal {Q}}, \Sigma , \delta , q_0, F_{{ {ut}}},F_{{ {rt}}},F_{{cand}})\) consists of

  • a finite set \({\mathcal {Q}} \subseteq \Omega \times BExpr \) of states (each being a pair of a name of some set \(\Omega \) and a Boolean condition) and an initial state \((q_0,true) \in {\mathcal {Q}}\),

  • an alphabet \(\Sigma \subseteq 2^G \times BExpr \),

  • a transition relation \(\delta \subseteq {\mathcal {Q}} \times \Sigma \times {\mathcal {Q}}\), and

  • three pairwise disjoint sets of accepting states: \(F_{{ {ut}}}\) (for unreachable targets), \(F_{{ {rt}}}\) (for reachable targets) and \(F_{{cand}}\) (for candidates).

Intuitively, a GIA is an extension of a condition automaton (and thus of a protocol automaton from Def. 1) that has three different sets of accepting states and allows to specify state invariants.

We let \({\mathcal {A}}\) denote the set of all GIAs. When drawing automata, we use \(*\) to denote an edge that matches any operation from Ops. We additionally require for each GIA, that (1) each state in the sets of accepting states \(F_{{ {ut}}}\) and \(F_{{ {rt}}}\) has no transitions to states not in \(F_{{ {ut}}}\) (resp. \(F_{{ {rt}}}\)) and (2) each accepting state from \(F_{{cand}}\) has at least a transition to itself.Footnote 2 More formally, we require that:

  1. 1.

    \(\forall q_{ {ut}}\!\in \!F_{{ {ut}}}: \lnot \exists q \!\in \! {\mathcal {Q}}: (q_{ {ut}}, op, q) \!\in \! \delta \wedge q \!\notin \!F_{{ {ut}}}\),

  2. 2.

    \(\forall q_{ {rt}}\! \in \!F_{{ {rt}}}: \lnot \exists q\! \in \! {\mathcal {Q}}: (q_{ {rt}}, op, q)\! \in \! \delta \wedge q \!\notin \! F_{{ {rt}}}\),

  3. 3.

    \( \forall q_{cand}\!\in \! F_{{cand}}: (q_{cand}, *, q_{cand}) \!\in \!\delta \).

Fig. 8
figure 8

A GIA generated during cooperative test case generation for the example program of Fig. 2 with states of \(F_{{ {ut}}}\) marked green, of \(F_{{ {rt}}}\) blue and of \(F_{{cand}}\) yellow. We elide state invariants (all true) and depict for transitions only the operation and non-true conditions

Figure 8 depicts an example of a GIA for the program of Fig. 2 with target nodes \(L\!=\!\{\ell _3, \! \ell _5, \!\ell _7,\!\ell _9\}\), where \(F_{{ {rt}}}\!=\!\{q_3\},\) \(F_{{ {ut}}}\!=\!\{q_7\}\) and \(F_{{cand}}\!=\!\{q_5,q_9\}\).

To fulfill the requirement (R6) from Sec. 4.1, we need to define a context independent semantics. Thus, the three sets of accepting states are employed to describe three different languages of a GIA: the set of paths leading to (1) \(F_{{ {ut}}}\), (2) \(F_{{ {rt}}}\), and (3) \(F_{{cand}}\). We first define what it means that an automaton covers a path, which is similar to the covering relation of condition automata and thus protocol automata. Covered semantic paths are used to establish a connection between information encoded within the GIA and the program represented as CFA.

Definition 3

A GIA \(A=({\mathcal {Q}}, \Sigma , \delta , q_0, F_{{ {ut}}},F_{{ {rt}}},F_{{cand}})\) covers a path \(\pi =\langle c_0,\ell _0 \rangle \) if there is a sequence \(0 \le k \le n\), with (called run), such that

  1. 1.

    \(q_k \in F_{{ {ut}}}\cup F_{{ {rt}}}\cup F_{{cand}}\),

  2. 2.

    \(\forall i, 1 \le i \le k: g_i \in G_i\),

  3. 3.

    \(\forall i, 1 \le i \le k: c_i \models \varphi _i\),

  4. 4.

    \(\forall i, 0 \le i \le k: c_i \models \psi _i\).

We say that A X-covers \(\pi \), \(X \in \{ut,rt,cand\}\), when \(q_k \in F_X\).

In contrast to protocol automata, we allow that the run \(\rho \) has fewer states than the path \(\pi \), as each state from \( F_{{ {ut}}}\cup F_{{ {rt}}}\cup F_{{cand}}\) has a transition to itself. Depending on the parameter value for X-cover, we define three sets of paths (languages) of a GIA A: \({\mathcal {P}}_{ {ut}}(A), {\mathcal {P}}_{ {rt}}(A)\) and \({\mathcal {P}}_{cand}(A)\). These three sets are then used to establish the connection between a GIA A and a CFA C: If, e.g., a path \(\pi \!\in \!{\mathcal {P}}(C)\) reaches a target node \(\ell \) and \(\pi \! \in \!{\mathcal {P}}_{ {rt}}(A)\), \(\ell \) is denoted reachable by A. The GIA depicted in Fig. 8 thus contains the information that \(\ell _3\) is reachable when the condition \(x=0\) holds, \(\ell _7\) is unreachable and that \(\ell _5\) and \(\ell _9\) are candidates for being reached when the condition \(x=5\) holds.

With these definitions at hand, we can formally define the correctness of the analysis information in a GIA. Thereby, we are able to later on reason about the correctness of combinations of tools in a cooperative setting.

Definition 4

Let A be a GIA, C a CFA and \(L \subseteq Loc\) a set of target nodes. A is said to be correct wrt. C and L if \({\mathcal {P}}_{ {ut}}(A) \subseteq \{ \pi \in {\mathcal {P}}(C) \mid \pi \) is infeasible or \(\pi \) is feasible and reaches no \(\ell \in L \}\) and \({\mathcal {P}}_{ {rt}}(A) \subseteq \{ \pi \in {\mathcal {P}}(C) \mid \pi \) is feasible and reaches some \(\ell \in L \}\).

Correctness thus means the automaton correctly (according to the program) marks paths as infeasible, as reaching no target or reaching some target nodes. Similarly, we can define the soundness of an OA or UA analysis, assuming that the target nodes L are encoded within the program C. The soundness is also needed to reason about the correctness of combinations of tools in a cooperative setting.

Definition 5

Let tool be an OA or UA analysis producing a GIA as output, i.e., we assume the tool to encode a mapping \(\textsf{tool}: {\mathcal {C}} \times {\mathcal {A}} \rightarrow {\mathcal {A}}\).

If tool is an OA analysis, it is sound whenever for all \(A, A' \in {\mathcal {A}}\), \(C \in {\mathcal {C}}\) with \(\textsf{tool}(A,C) = A'\) we have

  • \({\mathcal {P}}_{ {ut}}(A') \supseteq {\mathcal {P}}_{ {ut}}(A)\) and \({\mathcal {P}}_{ {rt}}(A') = {\mathcal {P}}_{ {rt}}(A)\), and

  • \(\forall \pi \in {\mathcal {P}}_{ {ut}}(A') {\setminus } {\mathcal {P}}_{ {ut}}(A)\): \(\pi \) is an infeasible path of C or is feasible but reaches no \(\ell \in L\).

If tool is an UA analysis, it is sound whenever for all \(A, A' \in {\mathcal {A}}\), \(C \in {\mathcal {C}}\) with \(\textsf{tool}(A,C) = A'\) we have

  • \({\mathcal {P}}_{ {rt}}(A') \supseteq {\mathcal {P}}_{ {rt}}(A)\) and \({\mathcal {P}}_{ {ut}}(A') = {\mathcal {P}}_{ {ut}}(A)\), and

  • \(\forall \pi \in {\mathcal {P}}_{ {rt}}(A') {\setminus } {\mathcal {P}}_{ {rt}}(A)\): \(\pi \) is a feasible path of C reaching some \(\ell \in L\).

Consequently, a sound tool always generates correct GIAs when started with a correct GIA.

Finally, we can define when verification or test case generation is completed, namely, when a correct GIA A is generated for a CFA \(C=(Loc,l_0, G)\) such that for all target nodes t there exists some \(\pi \in {\mathcal {P}}_{ {rt}}(A) \cup {\mathcal {P}}_{ {ut}}(A)\) such that \(\pi \) reaches t (all target nodes covered or unreachable).

Fig. 9
figure 9

Cooperative Test Case Generation using GIA as exchange formats

6 Using GIAs in cooperative validation

The basic idea of cooperation is to store analysis results computed by one tool in an artifact and let another tool start its work using this additional information. We next briefly summarize the existing approaches of cooperative validation presented in Sec. 3 and explain how they could make use of GIAs. Note that not all forms of cooperation make use of OA and UA components, some may combine only OA or only UA tools. In these cases, we are still able to use GIAs as an exchange format within the cooperation.

Cooperative Test Case Generation. The goal of test case generation is the computation of a test suite leading to paths covering all target nodes. This can be implemented as a cooperation of an UA analysis Tester (e.g., concolic execution) with an OA analysis Verifier (e.g., bounded model checking) as depicted in Fig. 9. Tester is responsible for generating the test suite and Verifier for identifying unreachable target nodes. Hence, Tester reports in a GIA within \({\mathcal {P}}_{ {rt}}\) the set of already found paths to targets, where the concrete variable values used for following this path are added as assumptions, and in \({\mathcal {P}}_{{cand}}\) the set of not yet covered target paths; Verifier tries to show infeasibility of paths in \({\mathcal {P}}_{{cand}}\) and if it succeeds, moves these into \({\mathcal {P}}_{ {ut}}\). Next, Tester continues on the remaining targets, and this cycle continues until all target nodes are covered by the test suite. In addition, Verifier might add assumptions on program transitions to guide Tester to uncovered targets.

This form of analysis has been proposed by Daca et al. [42] as a conceptual integration using an ARG for information exchange and can be realized using GIA in a cooperative setting. There exist other cooperative approaches for test case generation, namely CoVeriTest [16, 65, 66] and conditional testing [7]. In contrast to the approach depicted in Fig. 9, conditional testing runs two different UA approaches either cyclic or in a sequence. Although it is strictly speaking not a combination of OA and UA approaches, we can also realize the cooperation using GIA as an exchange format following the same idea and encoding all found test cases within \({\mathcal {P}}_{ {rt}}\). In CoVeriTest, two OA analyses are combined for test case generation, each of them reporting the candidate test cases within \({\mathcal {P}}_{{cand}}\) and explored paths in \({\mathcal {P}}_{{ {ut}}}\). Each of them is equipped with a UA tool that validates all candidates and stores them within \({\mathcal {P}}_{{ {rt}}}\).

Cooperative Verification Using CEGAR.

Fig. 10
figure 10

Component-based CEGAR using GIA as exchange formats

The goal of software verification is to show that none of the target nodes are reachable. CEGAR is a scheme that is commonly used in software verification. In [24], the scheme has been presented in a decomposed version applicable for cooperative verification, called CC-Wit. Therein, an abstract model explorer uses a given precision in the form of predicates to explore the state space searching for a feasible path to a target node. If no such path is detected, the program is safe. Otherwise, a potential counterexample is given to a feasibility checker checking if the counterexample is spurious. If a real counterexample is found, the verification stops with the outcome "not safe", else a precision refiner is started to refine the abstraction by generating new predicates. In [24], violation and correctness witnesses are used for exchanging the information. A unification of the information exchanged within CEGAR using GIA is depicted in Fig. 10 and called CC-Gia. The Abstract Model Explorer is an OA component building an abstraction of the state space of the program while it reports the candidates for counterexamples within \({\mathcal {P}}_{{cand}}\) and may also mark explored safe paths within \({\mathcal {P}}_{{ {ut}}}\). The Feasibility Checker is an UA component that inspects the candidates for counterexamples and moves them to \({\mathcal {P}}_{ {rt}}\) when it can show them to be real. Otherwise, the path is marked as a candidate for being infeasible by the Feasibility Checker and given to the Precision Refiner, the second OA component within CC-Gia. Its task is to find a set of predicates showing the infeasibility of the spurious path. When Precision Refiner computed the new predicates, the path is moved to \({\mathcal {P}}_{{ {ut}}}\), where the predicates are given as state invariants for the path.

A similar combination of components appears in [5, 77] in a non-cooperative form, where precision refiner and abstract model explorer are a single component. Nevertheless, one could realize these concepts in a cooperative setting as described above.

Cooperative Verification via Conditional Model Checking.

Fig. 11
figure 11

Component based CEGAR using GIA as exchange formats

In contrast to the approaches discussed earlier, the concept of conditional model checking [15, 18] foresees a sequential combination of multiple conditional model checkers, where each of them is an OA tool. Information is exchanged using condition automata. Although the original combination consists of OA tools only, we can realize CMC using GIA, as depicted in Fig. 11. Each conditional verifier reports the partial verification result within \({\mathcal {P}}_{ {ut}}\) using conditions. The next one continues working on the remaining target nodes. In [41], the second conditional verifier is replaced by a testing tool, yielding a cooperation between OA and UA tools.

Cooperation on Invariant Generation. In CoVEGI [55], an OA analysis (the Main Verifier) is supported by a Helper Invariant Generator, as depicted in Fig. 12. The task of the helper invariant generator is to compute loop invariants for specific locations. As a loop invariant is an over-approximation of the concrete loop executions, the helper invariant generator is also an OA component. The Main Verifier generates a GIA, where it reports \({\mathcal {P}}_{{cand}}\) all paths from the program entry via the loop, for which the invariant is requested. Thereby, these paths are marked as a candidate for leading to a target node. The helper invariant generator is now asked to compute predicates, more precisely a loop invariant, showing that the paths are in fact infeasible. These invariants are encoded as state invariants for the head of the loop. By encoding the task of invariant generation in this way, we see that a Helper Invariant Generator solves the same task as a Precision Refiner in CEGAR.

Using GIAs in other forms of cooperation. For using GIAs to either decompose an existing conceptual integration or to build a novel form of cooperation, a component-wise procedure is advisable. First, each component needs to be classified either being OA or UA. In general, each component within the cooperation solves a certain task, e.g., proving that certain paths are infeasible, finding a concrete execution or a concrete path to a specific location, or generating a new abstraction in the form of predicates for a set of paths. For using GIAs as an exchange format between the components, (1) the tasks that should be solved need to be encoded within a GIA with respect to reachability and (2) the computed answer has to be stored within the GIA. For the former, one should use a set of paths within \({\mathcal {P}}_{{cand}}\), either by using all target states or only a specific one, if the component should focus on a specific path while completing the task. For the latter, the component can either move (some paths) to \({\mathcal {P}}_{{ {ut}}}\), respectively, \({\mathcal {P}}_{{ {rt}}}\), depending on whether it is OA or UA, or it does not change the paths and only adds additional information in form of path constraints or state invariants to it.

Fig. 12
figure 12

Component based CEGAR using GIA as exchange formats

7 GIA and off-the-shelf tools

Fig. 13
figure 13

Cooperative test case generation using ut-Reducer and Combiner

The scenarios sketched in Sec. 6 assume that all tools potentially employed understand GIAs. This is, however (or rather, of course) not the case. To still enable cooperation of tools, in particular, while still using the existing tools in a black box manner, we need two more operators on GIAs: (1) a way of encoding the information in the artifact into the only form of input accepted by the majority of tools, i.e., programs, and (2) a way of combining several partial results about programs as given by GIAs into one GIA to not lose any information.

We introduce the two components Reducer for the former case and a Combiner for the latter case that perform these operations. We depict in Fig. 13 a combination of an OA tool and UA tool cooperation on the task of cooperative test case generation. In the scenario, we assume that we want to use an off-the-shelf tester that is UA and a Verifier, as depicted in Fig. 9. When the Verifier generates a GIA containing the information that certain paths of the program are unreachable, the Reducer,Footnote 3 removes these paths from the program and generates a reduced program. This program is given to the Off-the-shelf Tester generating test cases for the reduced program. To be able to feed this information back into a GIA, we employ a Combiner to combine the information computed by the off-the-shelf tool with the GIA generated by the Verifier. The resulting GIA is then given to the Verifier and the cycle starts anew.

7.1 Reducer

For the first operation on GIAs, we use the concept of reducers as introduced in [18, 41]. A reducer reduces a program to a certain part by removing some paths, and thereby allowing off-the-shelf tools to use the information computed by others. We define two different reducers, one removing paths that are \({ {ut}}\)-covered by the GIA and one removing them that are \({ {rt}}\)-covered.

Definition 6

An X-reducer for \(X \in \{ut,rt\}\) is a mapping \(red_X: {\mathcal {C}} \times {\mathcal {A}} \rightarrow {\mathcal {C}}\) satisfying

$$\begin{aligned} \forall C \in {\mathcal {C}}, A \in {\mathcal {A}}: P \subseteq {\mathcal {P}}(red_X(C,A)) \subseteq {\mathcal {P}}(C)\, \end{aligned}$$

where \(P= {\left\{ \begin{array}{ll} {\mathcal {P}}(C) {\setminus } {\mathcal {P}}_X(A) &{} \text { if } F_{{cand}}= \emptyset \text { in } A \\ {\mathcal {P}}_{{cand}}(A) {\setminus } {\mathcal {P}}_X(A) &{} \text { otherwise.} \end{array}\right. }\)

Algorithm 1
figure c

X-Reducer

A reducer for \(X=ut\) in the case that \(F_{{cand}}=\emptyset \) is already existing [18]. In Alg. 1 we provide a parameterized reducer for both values of X, building on the existing one.Footnote 4 It first calls the existing reducer and obtains a program reduced wrt. X. As \({\mathcal {P}}_{{cand}}\) contains the set of interesting paths whereon the succeeding tool should focus, X-Reducer minimizes the computed reduced CFA wrt. these paths (in lines 3 to 13). We get the following result:

Theorem 1

Algorithm 1 is an X-reducer according to Definition 6.

Proof

We first show that Algorithm 1 works correctly if \(F_{{cand}}\!=\!\emptyset \) holds: The algorithm Reducer called in line 2 takes an automaton with one set of final states F as input. It has been shown that Reducer retains at least all paths that are not covered by the given automaton w.r.t. F and that the program generated does not contain any path that is not present in the original program [18]. We call Reducer with the GIA A only using \(F_X\), thus it reduces the program such that at most all paths that are \(X-\)covered by the GIA are removed. Therefore, Reducer and thus Algorithm 1 work correctly if \(F_{{cand}}\!=\!\emptyset \). In case that \(F_{{cand}}\!\ne \!\emptyset \), the reducer has to generate a program that contains at least all paths \({cand}\)-covered by A. In lines 3-13 we build a set containing a superset of these paths, and remove the other paths, i.e., only these that are not \({cand}\)-covered by A. Thus, Algorithm 1 also works in this case concluding the proof. \(\square \)

Algorithm 2
figure d

Combiner

7.2 Combiner

When several tools compute analysis information, we have to make sure that all this information is preserved. To this end, we introduce a combiner for the combination of GIAs. The combiner’s goal is to keep all information on \({\mathcal {P}}_{ {ut}}\) and \({\mathcal {P}}_{ {rt}}\) from both GIAs.

Definition 7

A combiner is a partial mapping \(comb:{\mathcal {A}} \times {\mathcal {A}} \rightarrow {\mathcal {A}}\) which is defined on consistent GIAs \(A_1\) and \(A_2\) with \( {\mathcal {P}}_{ {ut}}(A_1) \cap {\mathcal {P}}_{ {rt}}(A_2) = \emptyset = {\mathcal {P}}_{ {rt}}(A_1) \cap {\mathcal {P}}_{ {ut}}(A_2)\) such that

$$\begin{aligned} \forall&A_1, A_2 \in {\mathcal {A}}: \\&{\mathcal {P}}_{ {ut}}(comb(A_1,A_2)) = {\mathcal {P}}_{ {ut}}(A_1) \cup {\mathcal {P}}_{ {ut}}(A_2) \\ {}&\wedge {\mathcal {P}}_{ {rt}}(comb(A_1,A_2)) = {\mathcal {P}}_{ {rt}}(A_1) \cup {\mathcal {P}}_{ {rt}}(A_2) \ . \end{aligned}$$

An algorithm for a combiner is given in Alg. 2, for presentation purposes assuming that each edge in \(\delta _1,\delta _2\) contains only a single transition. The intuitive idea of the Combiner is to build the union of the two GIAs and consider newly computed information: For example, if there is a path \(\pi \), \(\pi \! \in \!{\mathcal {P}}_{{cand}}(A_1)\) and \(\pi \!\in \!{\mathcal {P}}_{ {ut}}(A_2)\), Combiner ensures that \(\pi \! \in \!{\mathcal {P}}_{ {ut}}(A_3)\) holds for the combined GIA \(A_3\). To this end, Combiner builds the new GIA \(A_3\) by searching for common sub-paths in the input-GIAs \(A_1\) and \(A_2\). A state in \(A_3\) is a tuple \((a_1,a_2)\) of two states, \(a_1 \!\in \!Q_1\) and \(a_2 \!\in \! Q_2\), both reachable on the same path. If the paths diverge, the state is split, where the placeholders ’\(\circ \)’ and ’\(\bullet \)’ are used to replace either \(a_1\) or \(a_2\). We use, e.g., ’\(\circ \)’ if the transitions from \(a_1\) and from \(a_2\) contain different CFA edges and ’\(\bullet \)’ if the successor states have different state invariants. For combination, Alg. 2 applies the method Merge , given in Appendix B.

Fig. 14
figure 14

A GIA generated during cooperative test case generation, for example, the program of Fig. 2 with states of \(F_{{ {ut}}}\) marked green, of \(F_{{ {rt}}}\) blue and of \(F_{{cand}}\) yellow. We elide state invariants (all true) and depict for transitions only the operation and non-true conditions. We define op:=’x = random();’

An application of Alg. 2 for the program from Fig. 3 is depicted in Fig. 14. We use the two GIAs \(A_1\) (in Fig. 14a) and \(A_2\) (in Fig. 14b) as inputs, the resulting GIA \(A_3\) is shown in Fig. 14c, where we elided paths that are contained twice to increase readability. \(A_1\) is generated during test case generation by an UA tool, containing the information that the target node \(\ell _3\) is reachable when \(x=0\) holds. \(A_2\) is produced by an OA tool, that marks the target node \(\ell _7\) as unreachable. As both, \(A_1\) and \(A_2\), contain a path to \(\ell _7\), but \(q_7\in F_{{cand}}\) in \(A_1\) and \(s_7 \in F_{{ {ut}}}\) in \(A_2\), the combiner generated a successor using ’\(\bullet \)’ instead of \(q_7\) for \(A_3\) to maintain the information that \(\ell _7\) is unreachable. In contrast, the successor of the state \((q_6',\!s_6)\) is \((q_7',\circ )\), as \(q_6\) has a successor \(q_7\) but \(s_6\) does not. Using ’\(\circ \)’ instead of ’\(\bullet \)’ ensures that \((q_7',\circ )\) is not in \(F_{{cand}}\) in \(A_3\) (cf. line 24 in Alg. 2), because \(A_2\) contains the information that this node is unreachable.

Additionally, Combiner maintains more precise information on paths from \({\mathcal {P}}_{{cand}}\): If a path \(\pi \) is present in \({\mathcal {P}}_{{cand}}(A_1)\) and \({\mathcal {P}}_{{cand}}(A_2)\), once with and once without condition, the condition is also present on the path in the combined GIA. In our example, both \(A_1\) and \(A_2\) contain a path covering \(\ell _5\). \(A_1\) has a path (\(q_1,q_2',\ldots \)) with the condition true and \(A_2\) a path (\(s_1,s_2,\ldots \)) labeled with \(x=5\).

In the combined GIA \(A_3\), the condition \(x=5\), and thus the more precise information, is maintained. The resulting GIA is not guaranteed to be minimal, meaning that it may contain some paths multiple times and contains paths that do not lead to an accepting state. For example, \(A_3\) contains two paths both reaching \(\ell _5\) with the same condition.

Theorem 2

Algorithm 2 is a combiner according to Definition 7.

Proof

Intuitively, we have to show that for the combination A of two GIAs \(A_1\) and \(A_2\) each path \({ {rt}}\)-covered by either \(A_1\) or \(A_2\) is also \({ {rt}}\)-covered by A and that the reverse holds (and analogously, that both properties hold for \({ {ut}}\)-covered paths). We therefore inductively construct an accepting run of A for a path \(\pi \) that is \({ {rt}}\)-covered by either \(A_1\) or \(A_2\) and vice versa. The full formal proof can be found in Appendix C. \(\square \)

7.3 Using reducer and combiner

Finally, we can state that connecting tools via reducers and combiners do not lose any of the already computed analysis results, as depicted in Fig. 13. This property guarantees that any arbitrary combination of sound OA and UA tools achieves the same progress, as the employed tools.

Theorem 3

Let \(A \!\in \!{\mathcal {A}}\) be a correct GIA, \(C \!\in \!{\mathcal {C}}\) a CFA, tool a sound UA or OA analysis and \(X\! \in \! \{ut,rt\}\). Then, for a GIA \(A' = comb(\textsf{tool}(red_X(A,C)), A)\) we get

  • \({\mathcal {P}}_{ {rt}}(A') = {\mathcal {P}}_{ {rt}}(A) \wedge {\mathcal {P}}_{ {ut}}(A') \supseteq {\mathcal {P}}_{ {ut}}(A)\) if tool is an OA, and

  • \({\mathcal {P}}_{ {ut}}(A') = {\mathcal {P}}_{ {ut}}(A) \wedge {\mathcal {P}}_{ {rt}}(A') \supseteq {\mathcal {P}}_{ {rt}}(A)\) if tool is an UA.

Proof

As sound OA and UA tools increase the set of \({ {ut}}\)-covered resp. \({ {rt}}\)-covered paths and the reducer retains all this information, the correctness follows directly from Theorem 2 and Definition 5. \(\square \)

Let us revisit Fig. 13 used to exemplify the construction of combiner and reducer in the setting of cooperative test case generation on a more concrete level. We use an OA tool called Verifier working on GIAs as well as ut-Reducer and Combiner for an Off-the-shelf Tester, that does not understand GIAs. When started on the program from Fig. 3, ut-Reducer is called with the initial, empty GIA and generates the reduced program, which is the same as the original one. Next, Off-the-shelf Tester finds a test suite covering \(\ell _3\) and generates the GIA \(A_1\) depicted in Fig. 14a, that is merged with the empty GIA, not changing \(A_1\). The original program and \(A_1\) are given to Verifier which (1) computes that \(\ell _7\) is unreachable and (2) computes a path potentially leading to \(\ell _5\) and \(\ell _9\) under the condition \(x=5\). The Verifier computes the GIA \(A_3\), depicted in Fig. 14c, which also contains all information on \(A_1\). As not all target nodes are covered by \(F_{{ {rt}}}\) and \(F_{{ {ut}}}\) in \(A_3\), a second iteration starts: At first, the ut-Reducer computes the reduced program containing only the else-branch starting in line 5. Off-the-shelf Tester confirms that \(\ell _5\) and \(\ell _9\) are reachable. This information, encoded as GIA, is finally combined with \(A_3\). Now, all target nodes are either covered or identified as unreachable, and hence, the computation stops.

8 Implementation

To demonstrate the feasibility of GIAs as an exchange format and to show that the developed theoretical concepts work in practice, we exemplarily realized two conceptually different forms of cooperation: Cooperative test case generation as described in Sec. 6 and depicted in Fig. 13 and component-based CEGAR (C-Cegar [24]) using only GIAs as exchange format, as explained in Sec. 6 and depicted in Fig. 10 and Fig. 13.

We implemented GIAs based on condition automata and realized our instances of cooperative test case generation and component-based CEGAR using CoVeriTeam [19]. CoVeriTeam is a framework that provides an easy way to build different forms of cooperative software verification. It provides a language to describe the communication between different components and their inputs and outputs. The language allows for combining different actors in sequence, in parallel, or in a cyclic manor. We integrate the GIA as an exchange format in CoVeriTeam. In our setting, we use CoVeriTeam for orchestration as well as for monitoring of the progress of the composition, i.e., checking whether all target nodes are already covered.

Additionally, we built modules within CPAchecker [21] that allow processing a GIA as input as well as generating a GIA as output. Thereby, we can reuse existing analyses of CPAchecker for the evaluation. We built ut-reducer and rt-reducer described in Alg. 3 as well as the combiner from Alg. 2 within CPAchecker, forming a standalone-executable component, also fully integrated in CoVeriTeam.

Component-based CEGAR. The original implementation of component-based CEGAR(C-Cegar) (here called CC-Wit) contains three components, a model explorer, a feasibility checker, and a precision refiner, which are executed in a loop and exchange correctness and violation witnesses. Our re-implementation CC-Gia, also contains these three components, as depicted in Fig. 10. Note that CEGAR assumes that the feasibility checker is precise in the sense that it reports a counterexample as being spurious, only if all paths covered by the counterexample have been checked. Otherwise, the same, real counterexample may be discovered multiple times, causing an infinite refinement loop. This situation is prevented by using a feasibility checker to exhaustively check all paths in the potential counterexample.

For CC-Gia, we can use the existing realizations of model explorer, feasibility checker and precision refiner in CPAchecker, as we updated the CPAchecker such that it can process GIAs as input and generate them as output. As the precision refiner in C-Cegar focuses on refining the latest infeasible counterexample generated by the model explorer, we additionally use a combiner to ensure that the precision increments computed in previous iterations are maintained. Note that exchange formats like violation and correctness witnesses can be translated into GIAs, allowing to use any off-the-shelf tool that produces these artifacts as outputs.

Cooperative test case generation. To realize cooperative test case generation using GIA as an exchange format, we follow the tooling used by Daca et al. [42]: We employ the concolic tester Crest [31] as UA tool and CPAchecker ’s predicate analysis using CEGAR as OA analysis. Crest is a concolic tester, meaning that inputs are not only generated randomly or using a heuristic, but paths are encoded as formulae using symbolic inputs and solved by an SMT-solver, in this case Yices [45]. Hence, Crest will eventually generate test inputs covering all reachable branches. As Crest is a testing tool under-approximating the state space, it is not able to identify paths of the program as unreachable. We therefore combine it with a predicate analysis from CPAchecker. The predicate analysis over-approximates the reachable state space and can thus mark target nodes as unreachable. Due to the precisely defined and uniformly applicable semantics of the GIA, we reuse the modules in CPAchecker that we built for CC-Gia. We employed parts of TBF [22] to let Crest generate test inputs in the TestComp test case formatFootnote 5 and generated a GIA for them. We additionally optimized the resulting GIA by removing duplicate paths, i.e., paths traversing the same nodes but are labeled with different assumptions. Within each iteration, Crest is started and generates at most 100 test inputs, before the predicate analysis is called to identify unreachable target nodes. The computation is complete, if all target nodes are ut-covered or rt-covered by the generated GIA. In the last step, we extract a test suite from that GIA by traversing its path leading to \(F_{{ {rt}}}\) and collecting all assumptions on the return values from random. The resulting cooperative test case generation approach is called CoTest.

We additionally used Crest standalone for comparison with CoTest. In the first step, Crest is used in the default configuration, generating at most 100 000 test cases in its internal format. Afterward, TBF is used to remove duplicate tests and transform the test cases into a test suite in the TestComp format, that is needed to measure the coverage of the generated test suite. To ensure that the test suite is generated, we stopped Crest after 80% of the available time and start the transformation.

9 Evaluation

The goal of the evaluation is twofold: First, we exemplarily show that GIAs are feasible as an exchange format and can be used in two different usage contexts, one for verification and one for test case generation. Second, we demonstrate the advantages of the clearly defined semantics of GIAs, allowing to precisely encode information for the exchange between analyses. As the goal for verification is to show the (un)-reachability of a certain error location, each program usually has a single target node. Hence, the verification task is completed if the target node is either shown to be unreachable or a concrete path leading to the target node is found. In contrast, for test case generation, most tasks contain multiple target nodes, some of them reachable and others not. Therefore, we evaluate whether the conceptual advantages of encoding information on reachable and unreachable target nodes within a single artifact can be witnessed in cooperate test case generation and lead to the computation of more precise results or a faster computation. The implementation and evaluation of this use case is an extension compared to the conference paper [56]. We therefore study the following two research questions:

RQ 1. Are GIAs feasible as exchange formats for component-based CEGAR?

RQ 2. Can (cooperative) test case generation also benefit from using GIAs for information exchange?

RQ 1 focuses on the usage of GIAs on a fine-grained scale, as CEGAR is usually employed within a single tool. In contrast, in RQ 2 we build a cooperation between two standalone, off-the-shelf tools. Note that we could also employ the component-based CEGAR using GIA in RQ 2, but as the performance of the tightly coupled version of C-Cegar is currently better (cf. [24]), we decided to use the tightly coupled one.

9.1 RQ 1: Component-based CEGAR

The goal of verification in this setting is to either find a concrete path leading to the target node (an alarm) or to compute a proof that the target node is not reachable. To evaluate the feasibility of GIAs as an exchange format, we compare the existing implementation of C-CEGAR (CC-Wit), using violation and correctness witnesses as exchange formats between the three components, with our re-implementation (CC-Gia) which only makes use of GIAs for the information exchange. For the comparison, we are interested in the question whether the same tasks that are solved by CC-Wit can also be solved by CC-Gia and want to know if there are tasks that can only be solved using GIA as exchange format. Thus, we compare the effectiveness (number of solved tasks) of CC-Wit and CC-Gia. In addition, we want to study if there is an effect on the execution time when encoding the same information in a different format, i.e., now using GIA instead of correctness and violation witnesses. Therefore, we compare the efficiency (consumed CPU time to compute the solution) of CC-Gia and CC-Wit.

Table 1 Comparison of the existing CC-Wit with the cooperation using only GIA for information exchange (CC-Gia)

Evaluation Setup. All experiments were run on machines with an Intel Xeon E3-1230 v5, 3.40 GHz (8 cores), 33 GB of memory, and Ubuntu 22.04 LTS. Each tool is limited to use 15 GB of memory, 4 CPU cores, and 15 min of CPU time per verification run. All experiments were executed using BenchExec [25], ensuring the resource limitations. We evaluated both approaches on the SV-Benchmarks, the largest publicly available benchmark for C-programs, in the version used for the SV-Comp ’22Footnote 6 containing in total 8 347 tasks. We used CPAchecker in version 2.1.2, CoVeriTeam in version 0.9, and BenchExec in version 3.11.

Fig. 15
figure 15

Program from SV-Comp, where \(x = 0\) is not a valid invariant at the loop head

Evaluation Results (Effectiveness). Table 1 contains the experimental results of CC-Gia and CC-Wit. It contains the number of overall correct answers, the correct proofs (where an approach correctly detects that no target node is reachable) and correct alarms (where a feasible path to a target node is computed). In addition, the incorrect answers are reported, as well as the number of tasks where CC-Gia computes the correct result, but CC-Wit does not (row add. solved).

For the total number of correctly solved tasks, we observe that CC-Gia can solve 94% of all tasks solved by CC-Wit. Within the 94%, the number of iterations and the computed refinements are almost always equal. The decrease originates mostly in the fact that CC-Gia is not able to compute a solution in the given time limit for 259 tasks, for which CC-Wit computes a solution within 900 s.

When looking at the additionally solved tasks, we can see the advantages of using GIAs: In 114 cases, CC-Gia computes the correct result, whereas CC-Wit either runs in a timeout or aborts the computation as it eventually makes no progress and gets stuck. Both situations are caused by the fact that not all information computed by the precision refiner is added in the correctness witness, a situation not happening when using GIAs. In [24], the authors argue that this situation is caused by the fact that correctness witnesses are not primarily designed for the exchange of a precision increment. The semantics of the GIA allows the precision refiner to encode the information, i.e., encode that a newly discovered predicate holds at a certain point of the infeasible counterexample path. Therefore, the refiner builds a GIA that only contains the infeasible counterexample, of which the last state is in \(F_{{ {ut}}}\), whereas the precision increment is encoded as an assumption on that path.

To exemplify the advantages, Fig. 15 presents a program taken from the SV-Benchmarks collection. The target node is the call to error in line 11. Within the first iteration of C-Cegar, the model explorer computes a potential counterexample that does not enter the loop starting in line 5. The GIA containing the potential counterexample is given to the feasibility checker, identifying it as infeasible. Next, the precision refiner computes the interpolant \(y=0\). This formula is encoded within the GIA generated by the precision refiner (depicted in Fig. 16a) and is given to the model explorer. Now, the second iteration starts and the model explorer computes a counterexample that traverses the loop once, depicted in simplified form in Fig. 16b. Again, the feasibility checker rejects the counterexample as invalid and it is given to the precision refiner. The precision refiner now computes a new interpolant, namely \(x=0\), that is valid after the first loop iteration, but invalid before the first loop iteration. As stated in [24], the precision refiner in CC-Wit fails to encode this new predicate within the correctness witness. In contrast, using GIAd as an exchange format allows the precision refiner to build the GIA depicted in Fig. 16c, precisely encoding the spurious counterexample as a path leading to \(s_5\in F_{{ {ut}}}\), where the new predicate is present as an assumption at the edge to \(s_3\) (after the first loop iteration). To not lose the information on the interpolant computed in the first iteration, the two GIAs from Fig. 16a and from Fig. 16c are combined into the GIA depicted in Fig. 16d. It contains two paths: one with the precision increment \(x=0\), and one with the precision increment \(y=0\). In the third iteration, these two predicates (and their negation) are sufficiently precise, such that the model explorer proves all paths leading to the target node unreachable.

Evaluation Results (Efficiency). Figure 17 compares the efficiency of CC-Gia and CC-Wit per task in a logarithmic scale. A point (xy) contains the CPU time taken by CC-Gia (as x) and by CC-Wit (as y) for all tasks where both compute the correct solution or one run into a timeout (TO). We observe that CC-Gia needs in general more time to find a solution, as most points are below the diagonal. The increase is in the vast majority of all cases smaller than factor two (lower dashed line). The CPU time increases on average by 1.4 (standard deviation is 0.4), and the median increase is 1.3. In CC-Wit, information from correctness witnesses are joined using a syntactic approach, which is fast and, as it is only applied within this setting, expresses the precision increment in a way optimized for C-Cegar. In contrast, CC-Gia employs the Combiner, which takes the semantics of the two GIAs that are combined into account to guarantee that no information is lost. The resulting GIA is significantly larger (contains more states and edges) and not optimized for C-Cegar, which is the reason most likely causing the increasing runtime and the number of timeouts.

Fig. 16
figure 16

Example showing the advantages of using GIA compared to correctness witness as exchange formats

Fig. 17
figure 17

Comparison of CPU time for CC-Wit and CC-Gia

The evaluation shows that GIAs are a flexible, precise, and practically suitable exchange format, applicable for C-Cegar. In particular, we see that the drawbacks of CC-Wit, namely losing information on computed precision increments, can be overcome. As a downside, the overall efficiency slightly decreases when using GIAs, due to their size and the fact that they are non-optimized for specific applications.

9.2 RQ 2: Cooperative test case generation

Test case generation aims at finding program inputs, such that either a certain statement (statement coverage) or all branching points (branch coverage) are visited at least once when executing the program with the given inputs. As we want to evaluate the cooperation of a tester and an OA analysis technique, we focus on branch coverage, as this yields in general several target nodes for a program. We compare the branch coverage of the test cases generated for our cooperative test case generation approach (called CoTest) with Crest as a standalone tool.

Evaluation Setup. We used the same evaluation machines as in RQ 1, but limited the time for test case generation to 5 minutes. We evaluated both approaches on a small subset of the SV-Benchmarks, in the version used for the TestComp ’22Footnote 7 As we are interested in exemplarily showing the usefulness of GIAs in the cooperative test case generation setting, we selected a subset of the tasks from the ControlFlow category of TestComp (tasks l5-l15.2) and used the running example of Fig. 2 (task a1) and an extended version given in Appendix D in Fig. 18 (task a2). The selected tasks work with simple integer variables and do not make use of arrays or pointers. The tasks selected from the SV-Benchmarks contain an infinite loop, where all variables have the same value at the start of each iteration. Thus, the loop does not affect the reachability of the target nodes. As our reducer implementation works best for loop-free programs and to avoid infinite computation for the coverage measurement, we removed the loop.

Table 2 Results of test case generation for CoTest and Crest

To compute the coverage of the generated test suites, we used TestCov [23], the tool also used in the TestComp. We use TestCov with a 30 minutes timeout, in contrast to TestComp, where only a five-minute timeout is used. We used CPAchecker in version 2.1.2, CoVeriTeam in version 0.9, TestCov in version 3.6, and BenchExec in version 3.11.

Evaluation Results. Table 2 contains the experimental results for Crest and our cooperative test case generation approach CoTest. It contains the size of the test suite generated in the column #tests, the coverage achieved with each test suite and the CPU time taken to compute the test suite for each task.

We generally observe that both tools generate test suites with nearly the same code coverage, especially the size of the test suite generated by CoTest is significantly smaller than for Crest. On average, the test suite generated by CoTest has only 0.024% of the size of the test suite generated by Crest, the largest difference is 0.006% and the smallest difference is 0.038%. In other words, the test suite that is generated by Crest within the given time limits contains on average 4 100 times more test cases than the test suite generated by CoTest.

Another significant difference between the test suites generated by Crest and those generated by CoTest is their size, i.e., the number of generated test cases in the suite. Each GIA contains the information which target nodes are reached for each test input. Hence, detecting test cases following the same path within the CFA and thus leading to the same target nodes is easy. This allows us to reduce the number of paths within the GIA and thereby the test suite extracted in the end. Although Crest used within CoTest generates up to 100 test cases per iteration, the evaluation indicates that using GIA allows for a reduction of the test suite by at most 80% on the benchmark set used, as for test cases following the same path within the CFA and hence covering exactly the same target nodes only one per path is exported. We also observe the advantage of the significantly smaller test suite, as TestCov is not able to process the full test suite generated by Crest within the time limit of five minutes.

For two tasks (a2, l10) CoTest can cover more branching points than Crest. Due to the size of the test suites generated by Crest, TestCov can only analyze around 10% of it before reaching the given time limit, which is most likely the reason for the lower coverage measured. For two tasks (l14.2 and l15.1), CoTest was not able to cover all target nodes within the given time restrictions. As we transform the GIA into a test suite only if all target nodes are covered, no test suite is generated in these two cases.

When comparing the CPU time consumed to generate the test suite, we observe that CoTest can complete the test case generation task faster than Crest. In the median, CoTest can finish the computation in only 28% of the time that is taken by Crest. As GIAs allow to precisely encode information on reachable and non reachable target nodes in a single artifact, predicate analysis can mark all unreachable target nodes as such and Crest can report all paths to target nodes within the same GIA. Thereby, the computation can be stopped in case all target nodes are either ut-covered or rt-covered. In contrast, Crest running standalone cannot detect that all target nodes are covered, in case some of them are unreachable. Thus, it continues the test case generation, until it reaches its internal timeout of 240 s.

In summary, GIAs are also suited as an exchange format for cooperative test case generation, allowing encoding information of reachable and unreachable target nodes within a single artifact. Due to the precisely defined semantics, it can be easily detected whether the task is already completed. Thereby, cooperative test case generation also benefits from using GIAs as an exchange format.

9.3 Threads to validity

There exist multiple concepts for cooperative software validation. We have implemented two of them and have experimentally shown that GIAs are suited as an exchange format that guarantees that no information is lost. For the other concepts of cooperative combinations using standardized exchange formats, namely CMC and CoVEGI, we explained how GIAs could replace the used artifacts. Thus, the findings from our evaluation will most likely carry over to these other forms of cooperation, meaning that GIAs can be applied in different scenarios as well.

Nevertheless, there are two underlying assumptions when using GIAs for exchanging information: First, we assume that each tool that either takes a GIA as input or produces a GIA works with the original C program or the reduced program generated by the reducer. In case a tool is working on a different representation (e.g., LLVM or Boogie), it has to be ensured that the information generated by the tool is mapped back to the original C-program. In such cases, one can apply the concept of mapper and adapter proposed in [55] to map the information back to the level of the C program.

Second, we assume that the task solved by the cooperation of tools can be expressed in terms of reachability and thus the information communicated can be expressed in terms of reachability, e.g., the (non)-reachability of certain locations or conditions and state invariants that hold on some or all paths to a certain location or a function. Although test case generation and verification of the correctness of a systemFootnote 8 can be expressed in terms of reachability, there might be other correctness criteria or properties that cannot be expressed, meaning that GIAs might not be applicable as an exchange format. In addition, GIAs allow to express additional information in terms of predicates. Hence, concrete (input) values needed for executing a specific path or information on variable values gathered during analysis, e.g., interval values as depicted in Fig. 7, need to be transformed into predicates. For example, concrete variable values can be expressed using assignments, and the information \(x \in [1,4]\) is translated to the formula \(1 \le x \le 4\). In case a combination of analyses is exchanging analysis information that is not representable using predicates, it is likely that the information cannot be encoded within a GIA (or any other instance of a protocol automaton).

Although there are two assumptions, that might limit the use of GIAs in certain, special cases, all in all, we think that GIAs can encode the information that is typically exchanged between OA and UA tools.

10 Conclusion

In this article, we have proposed general information exchange automata as an exchange format for the cooperation of over- and under-approximative analyses. It has a fixed well-defined semantics allowing its application in different scenarios. We have furthermore defined and implemented two operations on GIAs, reducing a program to the (remaining) task and combining results with previously computed information. These operations allow a re-use of off-the-shelf tools. We have formally shown that applying reducer and combiner maintains all relevant computed information. The feasibility of GIAs as exchange format has been demonstrated by applying it in an existing cooperative verification setting (C-Cegar) and in a test case generation setting.

For future work, we plan to implement other existing forms of combinations of OA and UA off-the-shelf tools in a cooperative setting using GIAs for the information exchange, such as for conditional model checking or k-induction. GIAs are also well suited for being applied in a parallelized cooperative setting, where multiple tools work side-by-side on the same task to increase the overall performance, as the combiner of arbitrary GIAs guarantees that no information is lost.