figure a
figure b

1 Introduction

Recent years have seen enormous progress in automatic software verification, driven amongst others by annual competitions like SV-COMP [13]. Software verification tools employ a bunch of different techniques for analysis, like predicate analysis, bounded model checking, k-induction, property-directed reachability, or automata-based methods. As however none of these techniques is superior over the others, today often a form of cooperative verification [24] is employed. The idea of cooperative verification is to have different sorts of analyses cooperate on the task of software verification. This principle has already been implemented in various forms [16, 19, 33, 59], in particular also as cooperations of testing and verification tools [10, 39, 41, 42]. Such cooperations most often take the form of sequential combinations, where one tool starts with the full task, stores its partial analysis result within some verification artefact, and the next tool then works on the remaining task. In contrast, parallel execution of different tools is in the majority of cases only done by portfolio approaches, simply running the different tools on the same task in parallel. One reason for using portfolios when employing parallel execution is the fact that it is unclear how to best split a program into parts on which different tools could work separately while still being able to join their partial results into one for the entire program.

With ranged symbolic execution, Siddiqui and Khurshid [86] proposed one such technique for splitting programs into parts. The idea of ranged symbolic execution is to scale symbolic execution by splitting path exploration onto several workers, thereby, in particular allowing the workers to operate in parallel. To this end, they defined so-called path ranges. A path range describes a set of program paths defined by two inputs to the program, where the path \(\pi _1\) triggered by the first input is the lower bound and the path \(\pi _2\) for the second input is the upper bound of the range. All paths in between, i.e., paths \(\pi \) such that \(\pi _1 \! \le \! \pi \! \le \! \pi _2\) (based on some ordering \(\mathord {\le }\) on paths), make up a range. A worker operating on a range performs symbolic execution on paths of the range only. In their experiments, Siddiqui and Khurshid investigated one form of splitting via path ranges, namely by randomly generating inputs, which then make up a number of ranges.

In this paper, we generalize ranged symbolic execution to arbitrary analyses. In particular, we introduce the concept of a ranged analysis to execute an arbitrary analysis on a given range and compose different ranged analyses, which can then operate on different ranges in parallel. Also, we propose a novel splitting strategy, which generates ranges along loop bounds. We implemented ranged analysis in the software verification tool CPAchecker  [21], which already provides a number of analyses, all defined as configurable program analyses (CPAs). To integrate ranged analysis in CPAchecker, we defined a new range reduction CPA, and then employed the built-in feature of analysis composition to combine it with different analyses. The thus obtained ranged analyses are then run on different ranges in parallel, using CoVeriTeam  [20] as tool for orchestration. We furthermore implemented two strategies for generating path ranges, our novel strategy employing loop bounds for defining ranges plus the original random splitting technique. A loop bound n splits program paths into ranges only entering the loop at most n times and ranges entering for more than n timesFootnote 1.

Our evaluation on SV-COMP benchmarks [36] first of all confirms the results of Siddiqui and Khurshid [86] in that symbolic execution benefits from a ranged execution. Second, our results show that a loop-bound based splitting strategy brings an improvement over random splitting. Finally, we see that a composition of ranged analyses can solve analysis tasks that none of the (different) constituent analyses of a combination can solve alone.

2 Background

We start by introducing some notations on programs, defining path ranges, and introducing configurable program analysis as implemented in CPAchecker.

Fig. 1.
figure 1

Example program mid (taken from [86]) and its CFA

2.1 Program Syntax and Semantics

For the sake of presentation, we consider simple, imperative programs with a deterministic control-flow with one sort of variables (from some set \(\mathcal {V}\)) onlyFootnote 2. Formally, we model a program by a control-flow automaton (CFA) \(P =(L,\ell _0, G)\), where \(L\subseteq Loc \) is a subset of the program locations \(Loc \) (the program counter values), \(\ell _0\in L\) represents the beginning of the program, and control-flow edges \(G\subseteq L\times Ops\times L\) describe when which statements may be executed. Therein the set of statements \(Ops\) contains all possible statements, e.g., assume statements (boolean expressions over variables \(\mathcal {V}\), denoted by \( BExpr \)), assignments, etc. We expect that CFAs originate from program code and, thus, control-flow may only branch at assume operations, i.e., CFAs \(P =(L,\ell _0, G)\) are deterministic in the following sense. For all \((\ell ,op',\ell '), (\ell ,op'',\ell '')\in G\) either \(op'=op''\wedge \ell '=\ell ''\) or \(op', op''\) are assume operations and \(op'\equiv \lnot (op'')\). We assume that there exists an indicator function \(B _P: G \rightarrow \{T, F, N\}\) that reports the branch direction, either N(one), T(rue), or F(alse). This indicator function assigns \(N\) to all edges without assume operations and for any two assume operations \((\ell ,op',\ell '), (\ell ,op'',\ell '')\in G\) with \(op'\ne op''\) it guarantees \(B _P ((\ell ,op',\ell '))\cup B _P ((\ell ,op'',\ell ''))=\{T,F\}\). Since CFAs are typically derived from programs and assume operations correspond to the two evaluations of conditions of e.g., if or while statements, the assume operation representing the true evaluation of the condition is typically assigned \(T\). We will later need this indicator function for defining path orderings.

Figure 1 shows our example program mid, which returns the middle value of the three input values, and its CFA. For each condition of an if statement it contains one assume edge for each evaluation of the condition, namely solid edges labelled by the condition for entering the if branch after the condition evaluates to true and dashed edges labelled by the negated condition for entering the else branch after the condition evaluates to false, i.e., the negated condition evaluates to true. All other statements are represented by a single edge.

We continue with the operational semantics of programs. A program state is a pair \((\ell ,c)\) of a program location \(\ell \in L\) and a data state \(c\) from the set \(C\) of data states that assign to each variable \(\texttt{v}\in \mathcal {V}\) a value of the variable’s domain. Program execution paths are sequences of states and edges such that (1) they start at the beginning of the program and (2) only perform valid execution steps that (a) adhere to the control-flow, i.e., \(\forall 1\le i\le n: g_i=(\ell _{i-1},\cdot , \ell _i)\), and (b) properly describe the effect of the operations, i.e., \(\forall 1\le i\le n: c_i=sp_{op_i}(c_{i-1})\), where the strongest postcondition \(sp_{op_i} : C \rightharpoonup C\) is a partial function modeling the effect of operation \(op_i \in Ops\) on data states. Execution paths are also called feasible paths, and paths that fulfil properties (1) and (2a) but violate property (2b) are called infeasible paths. The set of all execution paths of a program \(P \) is denoted by \(paths(P)\).

2.2 Path Ordering, Execution Trees, and Ranges

Our ranged analysis analyses sets of consecutive program execution paths. To specify these sets, we first define an ordering on execution paths. Given two program paths and , we define their order \(\le \) based on their control-flow edges. More specifically, edges with assume operations representing a true evaluation of a condition are smaller than the edges representing the corresponding false evaluation of that condition. Following this idea, \(\pi \le \pi '\) if \(\exists \ 0\le k\le n: \forall \ 1\le i\le k: g_i=g'_i \wedge \big ((n=k \wedge m\ge n) \vee (m>k\wedge n>k\) \(\wedge B _P (g_{k+1})=T \wedge B _P (g'_{k+1})=F)\big )\). An execution tree is a tree containing all execution paths of a program with the previously defined ordering, where nodes are labelled with the assume operations.

Based on the above ordering, we now specify ranges, which describe sets of consecutive program execution paths analysed by a ranged analysis and which are characterized by a left and right path that limit the range. Hence, a range \([\pi , \pi ']\) is the set \(\{\pi _r\in paths(P)\mid \pi \le \pi _r\le \pi '\}\)Footnote 3. To easily describe ranges that are not bound on the left or right, we use the special paths \(\pi _{^{_\bot }}, \pi ^{_{\top }}\notin paths(P)\) which are smaller and greater than every path, i.e., \(\forall \pi \in paths(P): (\pi \le \pi ^{_{\top }})\) \(\wedge ~ (\pi ^{_{\top }}\not \le \pi ) \wedge (\pi _{^{_\bot }}\le \pi ) \wedge (\pi \not \le \pi _{^{_\bot }})\). Consequently, \([\pi _{^{_\bot }},\pi ^{_{\top }}]=paths(P)\).

As the program is assumed to be deterministic except for the input, a test case \(\tau \), \(\tau : \mathcal {V} \rightarrow \mathbb {Z}\), which maps each input variable to a concrete value, describes exactly a single path \(\pi \)Footnote 4. We say that \(\tau \) induces \(\pi \) and write this path as \(\pi _\tau \). Consequently, we can define a range by two induced paths, i.e., as \([\pi _{\tau _1}, \pi _{\tau _2}]\) for test cases \(\tau _1\) and \(\tau _2\). For the example program from Fig. 1, two example test cases are \( {\tau _1} = \{x: 0, y: 2, z: 1\}\) and \({\tau _2} = \{x: 1, y: 0, z: 2\}\). Two such induced path are , where \(c_1=[x\mapsto 0, y \mapsto 2, z \mapsto 1]\) and , where \(c_2=[x\mapsto 1, y \mapsto 0, z \mapsto 2]\).

2.3 Configurable Program Analysis

We will realize our ranged analysis using the configurable program analysis (CPA) framework [17]. This framework allows one to define customized, abstract-interpretation based analyses, i.e., it allows a selection of the abstract domain as well as a configuration for exploration. For the latter, one defines when and how to combine information and when to stop exploration. Formally, a CPA \(\mathbb {A}=(D,\rightsquigarrow , \textsf {merge}, \textsf {stop})\) consists of

  • the abstract domain \(D=(Loc \times C, (E,\top ,\sqsubseteq ,\sqcup ),\llbracket \cdot \rrbracket )\), which is composed of a set \(Loc \times C\) of program states, a join semi-lattice on the abstract states \(E\) as well as a concretization function, which fulfils that

    $$\begin{aligned} \llbracket \top \rrbracket =Loc \times C \mathrm {~and~} \forall e, e'\in E: \llbracket e\rrbracket \cup \llbracket e'\rrbracket \subseteq \llbracket e\sqcup e'\rrbracket \end{aligned}$$
  • the transfer relation \(\mathord {\rightsquigarrow } \subseteq E\times \mathcal {G} \times E\) defining the abstract semantics that safely overapproximates the program semantics, i.e.,

    figure h
  • the merge operator \(\textsf {merge}: E\times E\rightarrow E\) used to combine information that satisfies

    $$\begin{aligned} \forall e, e'\in E: e'\sqsubseteq \textsf {merge}(e,e') \end{aligned}$$
  • the termination check \(\textsf {stop}: E\times 2^E\rightarrow \mathbb {B}\) that decides whether the exploration of an abstract state can be omitted and that fulfils

    $$\begin{aligned} \forall e\in E, E_\textrm{sub}\subseteq E: \textsf {stop}(e,E_\textrm{sub})\implies \llbracket e\rrbracket \subseteq \bigcup _{e'\in E_\textrm{sub}}\llbracket e'\rrbracket \end{aligned}$$

To run the configured analysis, one executes a meta reachability analysis, the so-called CPA algorithm, configured by the CPA and provides an initial value \(e_{init} \in E\) which the analysis will start with. For details on the CPA algorithm, we refer the reader to [17].

As part of our ranged analysis, we use the abstract domain and transfer relation of a CPA \(\mathbb {V}\) for value analysis [9] (also known as constant propagation or explicit analysis). An abstract state \(v\) of the value analysis ignores program locations and maps each variable to either a concrete value of its domain or \(\top \), which represents any value. The partial order \(\sqsubseteq _\mathbb {V}\) and the join operator \(\sqcup _\mathbb {V}\) are defined variable-wise while ensuring that \(v\sqsubseteq _\mathbb {V} v' \Leftrightarrow \forall \texttt{v}\in \mathcal {V}: v(\texttt{v})=v'(\texttt{v}) \vee v'(\texttt{v})=\top \)Footnote 5 and \((v\sqcup _\mathbb {V} v')(\texttt{v})=v(\texttt{v})\) if \(v(\texttt{v})=v'(\texttt{v})\) and otherwise \((v\sqcup _\mathbb {V} v')(\texttt{v})=\top \). The concretization of abstract state \(v\) contains all concrete states that agree on the concrete variable values, i.e., \(\llbracket v\rrbracket _\mathbb {V}:=\left\{ (\ell ,c)\in Loc \times C\mid \forall \texttt{v}\in \mathcal {V}: v(\texttt{v})=\top \vee v(\texttt{v})=c (\texttt{v})\right\} \). If the values for all relevant variables are known, the transfer relation \( \overset{{}}{\rightsquigarrow }_{\mathbb {V}}\) will behave like the program semantics. Otherwise, it may overapproximate the executability of a CFA edge and may assign value \(\top \) if a concrete value cannot be determined.

To easily build ranged analysis instances for various program analyses, we modularize our ranged analysis into a ranged reduction and a program analysis. Technically, we will compose a ranged analysis from different CPAs using the concept of a composite CPA [17]. We demonstrate the composition for two CPAs. The composition of more than two CPAs works analogously or can be achieved by recursively composing two (composite) CPAs. A composite CPA \(\mathbb {A}_\times =(D_\times ,\overset{{}}{\rightsquigarrow }_{\times }, \textsf {merge}_\times , \textsf {stop}_\times )\) of CPA \(\mathbb {A}_1=((Loc \times C, (E_1,\top _1,\sqsubseteq _1,\) \(\sqcup _1),\llbracket \cdot \rrbracket _1),\overset{{}}{\rightsquigarrow }_{1}, \textsf {merge}_1, \textsf {stop}_1)\) and CPA \(\mathbb {A}_2=((Loc \times C, (E_2,\top _2,\sqsubseteq _2,\sqcup _2),\llbracket \cdot \rrbracket _2),\overset{{}}{\rightsquigarrow }_{2},\) \(\textsf {merge}_2, \textsf {stop}_2)\) considers the product domain \(D_\times =(Loc \times C,(E_1\times E_2, (\top _1,\top _2),\)

\(\sqsubseteq _\times ,\sqcup _\times ), \llbracket \cdot \rrbracket _\times )\) that defines the operators elementwise, i.e., \((e_1,e_2)\sqsubseteq _\times (e'_1, e'_2)\) if \(e_1\sqsubseteq _1 e'_1\) and \(e_2\sqsubseteq _2 e'_2\), \((e_1,e_1)\sqcup _\times (e'_1, e'_2)=(e_1\sqcup _1 e'_1, e_2\sqcup e'_2)\), and \(\llbracket (e_1, e_2)\rrbracket =\llbracket e_1\rrbracket _1\cap \llbracket e_2\rrbracket _2\). The transfer relation may be the product transfer relation or may strengthen the product transfer relation using knowledge about the other abstract successor. In contrast, \(\textsf {merge}_\times \) and \(\textsf {stop}_\times \) cannot be derived and must always be defined.

3 Composition of Ranged Analyses

Fig. 2.
figure 2

Composition of three ranged analyses (in orange)

In this section, we introduce the composition of ranged analyses as a generalization of ranged symbolic execution to arbitrary program analyses. The overall goal is to split the program paths into multiple disjoint ranges each of which is being analysed by a (different) program analysis. Therein, the task of a program analysis is to verify whether a program fulfils a given specification. Specifications are often given in the form of error locations, so that the task is proving the unreachability of error locations. The results for the verification task contain a verdict and potentially an additional witness (a justification or a concrete path violating the specification [14]). The verdict indicates whether the program fulfils the specification (verdict “true”), violates it (verdict “false”) or if the analysis did not compute a result (verdict “unknown”).

To ensure that an arbitrary program analysis operates on paths within a given range only, we employ ranged analysis. A ranged analysis is realized as a composition of an arbitrary program analysis (a CPA) and a range reduction (also given as a CPA below) ensuring path exploration to stay within the range. Then, a composition of ranged analyses is obtained by (1) splitting the program into ranges, (2) then running several ranged analyses in parallel, and (3) at the end aggregating analysis results (see Fig. 2). Splitting is described in Sec. 4. For aggregation, we simply return the verdict “false” whenever one analysis returns “false”, we return “unknown” whenever no analysis returns “false” and one analysis returns “unknown” or aborts, otherwise we return “true”. We do not support aggregation of witnesses yet (but this could be realized similar to [70]).

Fig. 3.
figure 3

Application of range reduction on the running example of Fig. 1

3.1 Ranged Analysis

Next, we define ranged analysis as a CPA composition of the target program analysis and the novel range reduction. The range reduction decides whether a path is included in a range \([\pi _{\tau _1},\pi _{\tau _2}]\) and limits path exploration to this range. We decompose the range reduction for \([\pi _{\tau _1},\pi _{\tau _2}]\) into a composition of two specialized ranged reductions \(\mathbb {R}_{[\pi _{\tau _1}, \pi ^{_{\top }}]}\) and \(\mathbb {R}_{[\pi _{^{_\bot }},\pi _{\tau _2}]}\), which decide whether a path is in the range \([\pi _{\tau _1}, \pi ^{_{\top }}]\) and \([\pi _{^{_\bot }},\pi _{\tau _2}]\), respectively. Since \([\pi _{\tau _1}, \pi _{\tau _2}] = [\pi _{\tau _1}, \pi ^{_{\top }}] \cap [\pi _{^{_\bot }}, \pi _{\tau _2}]\) and the composition stops the exploration of a path if one analysis returns \(\bot \), the composite analysis \(\mathbb {R}_{[\pi _{\tau _1}, \pi _{\tau _2}]}=\mathbb {R}_{[\pi _{^{_\bot }},\pi _{\tau _2}]}\times \mathbb {R}_{[\pi _{\tau _1}, \pi ^{_{\top }}]}\) only explores paths that are included in both ranges (which are exactly the paths in \([\pi _{\tau _1}, \pi _{\tau _2}]\)). Figure 3 depicts the application of range reduction to the example from Fig. 1, where the range reduction \(\mathbb {R}_{[\pi _{^{_\bot }},\pi _{\tau _2}]}\) is depicted in Fig. 3a and \(\mathbb {R}_{[\pi _{\tau _1}, \pi ^{_{\top }}]}\) in Fig. 3b and the composition of both range reductions in Fig. 3c. Finally, the ranged analysis of any arbitrary program analysis \(\mathbb {A}\) in a given range \([\pi _{\tau _1}, \pi _{\tau _2}]\) can be represented as a composition:

$$\begin{aligned} \mathbb {R}_{[\pi _{\tau _1}, \pi ^{_{\top }}]}\times \mathbb {R}_{[\pi _{^{_\bot }},\pi _{\tau _2}]}\times \mathbb {A} \end{aligned}$$

For \(\mathbb {R}_{[\pi _{\tau _1}, \pi _{\tau _2}]}\), we define \(\textsf {merge}_\times \) component-wise for the individual merge operators and \(\textsf {stop}_\times \) as conjunction of the individual stop operators. As soon as the range reduction decides that a path \(\pi \) is not contained in the range \([\pi _{\tau _1}, \pi _{\tau _2}]\) and returns \(\bot \), the exploration of the path stops for all analyses defined in the composition.

3.2 Range Reduction as CPA

Next, we define the range reduction \(\mathbb {R}_{[\pi _{\tau _1}, \pi ^{_{\top }}]}\) (\(\mathbb {R}_{[\pi _{^{_\bot }},\pi _{\tau _2}]}\), respectively) as a CPA, which tracks whether a state is reached via a path in \([\pi _{\tau _1}, \pi ^{_{\top }}]\) (\([\pi _{^{_\bot }}, \pi _{\tau _2}]\)).

Initialisation. To define the CPAs for \(\mathbb {R}_{[\pi _{\tau _1}, \pi ^{_{\top }}]}\) and \(\mathbb {R}_{[\pi _{^{_\bot }},\pi _{\tau _2}]}\), we reuse components of the value analysis \(\mathbb {V} \) (as described in Sec. 2.3). A value analysis explores at least all feasible paths of a program by tracking the values for program variables. If the program behaviour is fully determined (i.e., all (input) variables are set to constants), then only one feasible, maximal path exists, which is explored by the value analysis. We exploit this behaviour by initializing the analysis based on our test case \(\tau \) (being a lower or upper bound of a range):

$$\begin{aligned} e_{ INIT } = \left\{ \begin{array}{ll} v(x) = \tau (x) &{} \text { if } x \in dom(\tau ), x \in \mathcal {V} \\ v(x) = \top &{} \, \text {otherwise} \\ \end{array} \right. \end{aligned}$$

In this case, all variables which are typically undeterminedFootnote 6 and dependent on the program input have now a determined value, defined through the test case. As the behaviour of the program under the test case \(\tau \) is now fully determined, the value analysis only explores a single path \(\pi _\tau \), which corresponds to the execution trace of the program given the test case. Now, as we are interested in all paths defined in a range and not only a single path, we adapt the value analysis as follows:

Lower Bound CPA. For the CPA range reduction \(\mathbb {R}_{[\pi _{\tau _1}, \pi ^{_{\top }}]}\) we borrow all components of the value analysis except for the transfer relation \(\overset{{}}{\rightsquigarrow }_{\tau _1}\). The transfer relation \(\overset{{}}{\rightsquigarrow }_{\tau _1}\) is defined as follows:

$$(v, g, v') \in \mathord {\overset{}{\rightsquigarrow }_{\tau _1}} \text { iff }{\left\{ \begin{array}{ll} v=\top \wedge v' = \top , \text {or} \\ v \ne \top \wedge \ v' = \top \wedge B _P (g) = F \wedge (v,g,\bot ) \in \mathord {\overset{}{\rightsquigarrow }_{\mathbb {V}}}, \text {or} \\ v\ne \top \wedge \big ( v' \ne \bot \vee B _P (g) \ne F \big ) \wedge (v,g,v') \in \mathord {\overset{}{\rightsquigarrow }_{\mathbb {V}}} \end{array}\right. }$$

Note that \(\top \) represents the value analysis state where no information on variables is stored and \(\bot \) represents an unreachable state in the value analysis, which stops the exploration of the path. Hence, the second case ensures that \(\mathbb {R}_{[\pi _{\tau _1}, \pi ^{_{\top }}]}\) also visits the false-branch of a condition when the path induced by \(\tau _1\) follows the true-branch. Note that in case that \(\overset{}{\rightsquigarrow }_{\mathbb {V}}\) computes \(\bot \) as a successor state for a assumption g with \(B _P (g) = T\), the exploration of the path is stopped, as \(\pi _{\tau _1}\) follows the false-branch (contained in the third case).

Upper Bound CPA. For the CPA range reduction \(\mathbb {R}_{[\pi _{^{_\bot }},\pi _{\tau _2}]}\) we again borrow all components of the value analysis except for the transfer relation \(\overset{{}}{\rightsquigarrow }_{\tau _2}\). The transfer relation \(\overset{{}}{\rightsquigarrow }_{\tau _2}\) is defined as follows:

$$(v,g, v') \in \mathord {\overset{}{\rightsquigarrow }_{\tau _2}} \text { iff } {\left\{ \begin{array}{ll} v=\top \wedge v' = \top \\ v \ne \top \wedge v' = \top \wedge B _P (g) = T \wedge (v,g,\bot ) \in \mathord {\overset{}{\rightsquigarrow }_{\mathbb {V}}} \\ v\ne \top \wedge \big ( v' \ne \bot \vee B _P (g) \ne T \big ) \wedge (v,g,v') \in \mathord {\overset{}{\rightsquigarrow }_{\mathbb {V}}} \end{array}\right. }$$

The second condition now ensures that \(\mathbb {R}_{[\pi _{^{_\bot }},\pi _{\tau _2}]}\) also visits the true-branch of a condition when \(\pi _{\tau _2}\) follows the false-branch.

3.3 Handling Underspecified Test Cases

So far, we have assumed that test cases are fully specified, i.e., contain values for all input variables, and the behaviour of the program is deterministic such that executing a test case \(\tau \) follows a single (maximal) execution path \(\pi _\tau \). However, in practice, we observe that test cases can be underspecified such that a test case \(\tau \) does not provide concrete values for all input variables. We denote by \(P_\tau \) the set of all paths that are then induced by \(\tau \). In this case, we define:

$$ [\pi _{^{_\bot }}, P_\tau ] = \{\pi \mid \forall \pi ^\prime \in P_\tau : \pi \le \pi ^\prime \} = \{\pi \mid \pi \le \text {min}(P_\tau )\} $$


$$\begin{aligned}{}[P_\tau , \pi ^{_{\top }}] = \{\pi \mid \exists \pi ^\prime \in P_\tau : \pi ^\prime \le \pi \} = \{\pi \mid \text {min}(P_\tau ) \le \pi \} \end{aligned}$$

Interestingly enough, by defining \(\pi _\tau = \text {min}(P_\tau )\) for an underspecified test case \(\tau \) we can handle the range as if \(\tau \) would be fully specified.

4 Splitting

A crucial part of the ranged analysis is the generation of ranges, i.e., the splitting of programs into parts that can be analysed in parallel. The splitting has to either compute two paths or two test cases, both defining one range. Ranged symbolic execution [86] employs a random strategy for range generation (together with an online work-stealing concept to balance work among different workers). For the work here, we have also implemented this random strategy, selecting random paths in the execution tree to make up ranges. In addition, we propose a novel strategy based on the number of loop unrollings. Both strategies are designed to work “on-the-fly” meaning that none requires building the full execution tree upfront, they rather only compute the paths or test cases that are used to fix a range. Next, we explain both strategies in more detail, especially how they are used to generate more than two ranges.

Bounding the Number of Loop Unrollings (Lb). Given a loop bound \(i \in \mathbb {N}\), the splitting computes the left-most path in the program that contains exactly i unrollings of the loop. If the program contains nested loops, each nested loop is unrolled for i times in each iteration of the outer loop. For the computed path, we (1) build its path formula using the strongest post-condition operator [46], (2) query an SMT-solver for satisfiability and (3) in case of an answer SAT, use the evaluation of the input variables in the path formula as one test case. In case that the path formula is unsatisfiable, we iteratively remove the last statement from the path, until a satisfying path formula is found. A test case \(\tau \) determined in this way defines two ranges, namely \([\pi _{^{_\bot }},\pi _\tau ]\) and \([\pi _\tau ,\pi ^{_{\top }}]\). In case that the program is loop-free, the generation of a test case fails and we generate a single range \([\pi _{^{_\bot }}, \pi ^{_{\top }}]\). In the experiments, we used the loop bounds 3 (called Lb3) and 10 (called Lb10) with two ranges each. To compute more than two ranges, we use intervals of loop bounds.

Generating Ranges Randomly (Rdm). The second splitting strategy selects the desired number of paths randomly. At each assume edge in the program (either a loop head or an if statement), it follows either the true- or the false-branch with a probability of 50%, until it reaches a node in the CFA without successor. Again, we compute the path formula for that path and build a test case. This purely random approach is called Rdm.

Selecting the true- or the false-branch with the same probability may lead to fairly short paths with few loop iterations. As the execution tree of a program is often not balanced, it rather grows to the left (true-branches). Thus, we used a second strategy based on random walks, which takes the true-branch with a probability of 90%. We call this strategy Rdm9.

5 Implementation

To show the advantages of the composition of ranged analyses, especially the possibility of running conceptually different analyses on different ranges of a program, we realized the range reduction from Sec. 3.2 and the ranged analyses in the tool CPAchecker  [21]. The realization of the range reduction follows our formalization, i.e., it reuses elements from the value analysis, which are already implemented within CPAchecker.

Fig. 4.
figure 4

Construction of a ranged analysis from an off-the-shelf program analysis

Due to the composite pattern, we can build a ranged analysis as composition of range reduction and any existing program analysis within CPAchecker with nearly no effort. We can also use other (non CPA-based) off-the-shelf analyses by employing the construction depicted in Fig. 4: Instead of running the analysis in parallel with the range reduction CPA, we can build a sequential composition of the range reduction and the analysis itself. As off-the-shelf tools take programs as inputs, not ranges, we first construct a reduced program, which by construction only contains the paths within the given range. For this, we can use the existing residual program generation within CPAchecker  [19].

The composition of ranged analyses from Sec. 3 is realized using the tool CoVeriTeam  [20]. CoVeriTeam allows building parallel and sequential compositions using existing program analyses, like the ones of CPAchecker. We use CoVeriTeam for the orchestration of the composition of ranged analyses. The implementation follows the structure depicted in Fig. 2 and also contains the Aggregation component. It is configured with the program analysis \(\mathbb {A}_1, \cdots , \mathbb {A}_n\) and a splitting component. For splitting, we realized the splitters Lb3, Lb10, Rdm and Rdm9 in CPAchecker. Each splitter generates test cases in the standardized XML-based TEST-Comp test case formatFootnote 7. In case that the splitter fails (e.g. Lb3 cannot compute a test-case, if the program does not contain a loop) our implementation executes the analysis \(\mathbb {A}_1\) on the interval \([\pi _{^{_\bot }}, \pi ^{_{\top }}]\). For the evaluation, we used combinations of three existing program analyses within the ranged analysis, briefly introduced next.

Symbolic Execution. Symbolic execution [73] analyses program paths based on symbolic inputs. Here, states are pairs of a symbolic store, which describes variable values by formulae on the symbolic inputs, and a path condition, which tracks the executability of the path. Operations update the symbolic store and at branching points the path condition is extended by the symbolic evaluation of the branching condition. Furthermore, the exploration of a path is stopped when it reaches the program end or its path condition becomes unsatisfiable.

Predicate Analysis. We use CPAchecker ’s standard predicate analysis, which is configured to perform model checking and predicate abstraction with adjustable block encoding [22] such that it abstracts at loop heads only. The required set of predicates is determined by counterexample-guided abstraction refinement [35], lazy refinement [64], and interpolation [63].

Bounded Model Checking. We use iterative bounded model checking (BMC). Each iteration inspects the behaviour of the CFA unrolled up to loop bound k and increases the loop bound in case no property violation was detected. To inspect the behaviour, BMC first encodes the unrolled CFA and the property in a formula using the unified SMT-based approach for software verification [15]. Thereafter, it checks the satisfiability of the formula encoding to detect property violations.

For the evaluation, we build four different basic configurations and employed our different range splitters: Ra-2Se and Ra-3Se which employ two resp. three instances of symbolic execution in parallel, Ra-2bmc employing two instances of BMC and Ra-Se-Pred that uses symbolic execution for the range \([\pi _{^{_\bot }},\pi _\tau ]\) and predicate analysis on \([\pi _\tau ,\pi ^{_{\top }}]\) for some computed test input \(\tau \).

6 Evaluation

Siddiqui and Khurshid concentrated their evaluation on the issue of scaling, i.e., showing that a certain speed-up can be achieved by ranged execution [86]. More specifically, they showed that ranged symbolic execution can speed-up path exploration when employing ten workers operating on ranges in parallel. In contrast, our interest was not in scaling issues only, but also in the obtained verification results. We in particular wanted to find out whether a ranged analysis can obtain more results for verification tasks than analyses in isolation would achieve within the same resource limitations. Furthermore, our evaluation is different to [86] in that we limit the available CPU time, meaning that both analyses, the default analysis and the composition of ranged analyses, have the same resources and that we employ different analyses. Finally, we were interested in evaluating our novel splitting strategy, in particular in comparison to the existing random strategy. To this end, we studied the following research questions:

  • RQ1 Can a composition of ranged analyses, in particular our novel splitting strategy, increase the efficiency and effectiveness of symbolic execution?

  • RQ2 Can other analyses also benefit from using a composition of ranged analyses, in particular combinations of different analyses?

6.1 Evaluation Setup

All experiments were run on machines with an Intel Xeon E3-1230 v5 @ 3.40 GHz (8 cores), 33 GB of memory, and Ubuntu 20.04 LTS with Linux kernel 5.4.0. We use BenchExec  [23] for the execution of our experiments to increase the reproducibility of the results. In a verification run, a tool-configuration is given a task (a program plus specification) and computes either a proof (if the program fulfils the specification) or raises an alarm (if the specification is violated on the program). We limit each verification run to 15 GB of memory, 4 CPU cores, and 15 min of CPU time, yielding a setup that is comparable to the one used in SV-Comp. The evaluation is conducted on a subset of the SV-Benchmarks used in the SV-Comp and all experiments were conducted once. It contains in total 5 400 C-tasks from all sub-categories of the SV-Comp category reach-safety [36]. The specification for this category, and hence for these tasks, states that all calls to the function reach_error are unreachable. Each task contains a ground truth that contains the information, whether the task fulfils the specification (3 194 tasks) or not (2 206 tasks). All data collected is available in our supplementary artefact [60].

6.2 RQ 1: Composition of Ranged Analyses for Symbolic Execution

Evaluation Plan. To analyse the performance of symbolic execution in a composition of ranged analyses, we compare the effectiveness (number of tasks solved) and efficiency (time taken to solve a task) for the composition of ranged analyses with two and three ranged analyses each using a symbolic execution with one of the four splitters from Sec. 5 against symbolic execution standalone. For efficiency, we compare the consumed CPU time as well as the (real) time taken overall to solve the task (called wall time). The CPU time is always limited for the full configuration, s.t. an instance combining two ranged analyses in parallel has also only 900 s CPU time available, hence at most 450 s per ranged analysis. To achieve a fair comparison, we also executed symbolic execution in CoVeriTeam, where we build a simple configuration that directly calls CPAchecker running its symbolic execution.

Table 1. Number of correct and incorrect verdicts reported by SymbExec and compositions of ranged analyses with symbolic executions using different splitters

Effectiveness. Table 1 compares the verdicts of symbolic execution (SymbExec) and the configurations using a composition of ranged analyses with one range (and thus two analyses in parallel, called Ra-2Se) or with two ranges (and three analyses, called Ra-3Se). The table shows the number of overall correct verdicts reported (divided into the number of correct proofs and correct alarms), the number of correct verdicts additionally reported compared to SymbExec as well as the number of incorrect proofs and alarms reported. First of all, we observe that all configurations using a composition of ranged analyses compute more correct verdicts than SymbExec alone. We see the largest increase for Ra-2Se-Lb3, where 116 tasks are additionally solved. This increase comes nearly exclusively from the fact that Ra-2Se-Lb3 computes more correct alarms. The number of reported proofs does not change significantly, as SymbExec and all configurations of the composition of ranged analyses both have to check the same number of paths in the program leading to a property violation (namely all) for being infeasible. Thus, all need to do “the same amount of work” to compute a proof. As the available CPU time is identical for both, the ranged analyses do not compute additional proofs by sharing work. In contrast, for computing an alarm, finding a single path that violates the specification suffices. Thus, using two symbolic execution analyses in parallel working on different parts of the program increases the chance of finding such a violating path. All configurations employing the composition of ranged analyses compute a few more false alarms. For these tasks, SymbExec runs into a timeout and would also compute a false alarm, if its time limit would be increased.

For configurations using three symbolic executions in parallel, we used three splitters: Ra-3Se-Lb, which uses both loop-bound splitters in parallel, i.e., we have the ranges with less than three loop unrollings, three to ten loop unrollings and more than ten, and Ra-3Se-Rdm resp. Ra-3Se-Rdm9, which both employ the random splitting to generate two ranges. Again all configurations can compute more correct alarms compared to SymbExec, even more than Ra-2Se-Lb3. Again, splitting the state space in even more parts that are analysed in parallel increases the chance to find an alarm.

Finally, when comparing the effectiveness of the different strategies employed to generate bounds, we observe that splitting the program using our novel component Lb3 is more effective than using a randomly generated bound when using two and three symbolic execution analyses in parallel.

Fig. 5.
figure 5

Scatter plot comparing SymbExec and Ra-2Se-Lb3

Fig. 6.
figure 6

Median factor of time increase for different configurations of Ra-2Se

Efficiency. For comparing the efficiency of compositions of ranged analyses, we compare the CPU time and the wall time taken to compute a correct solution by SymbExec and several configurations of ranged analysis. We excluded all tasks where the generation of the ranges fails, as SymbExec and the composition of ranged analyses behave equally in these cases. In general, all configurations consume overall approximately as much CPU time as SymbExec to solve all tasks and are even faster w.r.t. wall time. The scatter-plot in Fig. 5 visualizes the CPU time consumed to compute a result in a log-scale by SymbExec (on the x-axis) and by Ra-2Se-Lb3 (on the y-axis), for tasks solved correctly by both analyses. It indicates that for tasks solved quickly, Ra-2Se-Lb3 requires more time than SymbExec, as the points are most of the time above the diagonal, and that the difference gets smaller the longer the analyses run.

We present a more detailed analysis of the efficiency in Fig. 6a and 6b. Each of the bar-plots represents the median factor of the increase in the run time for tasks that are solved by SymbExec within the time interval that is given on the x-axis. If for example SymbExec solves all tasks in five CPU seconds and Ra-2Se-Lb3 in six CPU seconds, the factor would be 1.2, if SymbExec takes five CPU seconds and Ra-2Se-Lb3 only three, the factor is 0.6. The width of the bars corresponds to the number of tasks within the interval. Figure 6a visualizes the comparison of the CPU time for Ra-2Se-Lb3 and SymbExec. For Ra-2Se-Lb3, the median and average increase is 1.6 for all tasks. Taking a closer look, in the median it takes twice as long to solve tasks which are solved by SymbExec within at most ten CPU seconds. Generating the ranges is done for the vast majority of all tasks within a few seconds. For tasks that can be solved in fewer than ten CPU seconds, the nearly constant factor for generating the ranges that is present in each run of Ra-2Se-Lb3 has a large impact on both CPU and wall time taken. Most importantly, the impact gets smaller the longer the analyses need to compute the result (the factor is constantly decreasing). For tasks that are solved by SymbExec in more than 50 CPU seconds, Ra-2Se-Lb3 is as fast as SymbExec, for tasks solved in more than 100 CPU seconds it is 20% faster. As stated above, the CPU time consumed to computing a proof is not affected by parallelization. Thus, when only looking at the time taken to compute a proof, Ra-2Se-Lb3 takes as long as SymbExec after 50 CPU seconds. In contrast, Ra-2Se-Lb3 is faster for finding alarms in that interval. A more detailed analysis can be found in the artefact [60].

When comparing the wall time in Fig. 6b, the positive effect of the parallelization employed in all configurations of a composition of ranged analyses gets visible. Ra-2Se-Lb3 is faster than SymbExec, when SymbExec takes more than 20 seconds in real time to solve the task. To emphasize the effect of the parallelization, we used pre-computed ranges for Ra-2Se-Lb3. Now, Ra-2Se-Lb3 takes only the 1.1-fold wall time in the median compared to SymbExec, and is equally fast or faster for all tasks solved in more than ten seconds.

Table 2. Number of correct and incorrect verdicts reported by compositions of bounded model checking (upper half) and combinations of symbolic execution and predicate analysis (lower half) using different splitters
figure i

6.3 RQ 2: Composition of Ranged Analyses for Other Analyses

Evaluation Plan. To investigate whether other analysis combinations benefit from a composition of ranged analyses, we evaluated two combinations: The first uses two instances of BMC (Ra-2bmc), the second one uses symbolic execution on the interval \([\pi _{^{_\bot }}, \pi _\tau ]\) and predicate analysis on the range \([\pi _\tau , \pi ^{_{\top }}]\) (Ra-Se-Pred). We are again interested in effectiveness and efficiency.

Results for BMC. The upper part of Tab. 2 contains the results for a composition of ranged analyses using two instances of BMC. In contrast to Ra-2Se, Ra-2bmc does not increase the number of overall correct verdicts compared to Bmc. Ra-2bmc-Rdm9 computes 48 correct verdicts that are not computed by Bmc, it also fails to compute the correct verdict in 77 cases solved by Bmc. Both observations can mainly be explained from the fact that one analysis computes a result for a task where the other runs into a timeout. Again, we observe that the composition of ranged analyses computes additional alarms (here 36), as both ranged analyses search in different parts of the program.

Fig. 7.
figure 7

Median factor of time increase for different compositions of ranged analyses

When comparing the efficiency, we notice that the CPU time consumed to compute a result for Ra-2bmc-Rdm9 (and all other instances) is higher than for Bmc. In average, the increase is 2.6, the median is 2.5, whereas the median increase for tasks solved in more than 100 CPU seconds by Bmc is 1.1. For wall time, where we depict the increases in Fig. 7a, the median overall increase is 1.9. This high overall increase is caused by the fact that Bmc can solve nearly 65% of all tasks within ten seconds wall time. Thus, the effect of computing the splitting has a big impact on the factor. For more complex or larger instances, where Bmc uses more time, the wall time of Ra-2bmc-Rdm9 is comparable, for instances taking more than 100 seconds, both takes approximately the same time.

Results for Predicate Analysis and Symbolic Execution. Table 2 also contains the results for the compositions of ranged analyses using predicate analysis and symbolic execution in combination. Here, the column “add.” contains the tasks that are neither solved by Pred nor SymbExec. Both default analyses used in this setting have different strengths, as Pred solves 1 517 tasks not solved by SymbExec, and SymbExec 649 not solved by Pred. 737 tasks are solved by both analyses.

The most successful configuration of the composition of ranged analyses again uses Lb3 for generating the ranges. In comparison to SymbExec and Pred, Ra-Se-Pred-Lb3 computes 635 more overall correct verdicts than SymbExec, but 233 fewer than Pred. It solves 430 tasks not solved by Pred and 918 tasks not solved by SymbExec. Most important, it can compute 36 correct proofs and alarms that are neither found by Pred nor SymbExec. The effect that tasks can be solved by the composition of ranged analyses that are not solvable by one or both instances lays in the fact that both analyses work only on a part of the program, making the verification problem easier. Unfortunately, the remaining part is sometimes still too complex for the analysis to be verified in the given time limit. Then, Ra-Se-Pred-Lb3 cannot compute a final result.

When evaluating the effectiveness of Ra-Se-Pred-Lb3, we need to compare it to both Pred and SymbExec. Figure 7b compares the median factor of the wall time increase for Pred and SymbExec. For both, we observe that the median increase factor of the wall time is high (2.1 for Pred and 1.6 for SymbExec) for tasks that are solved quickly (within ten seconds), but decreases for more complex tasks. For tasks that are solved with a wall time greater 100 s, Ra-Se-Pred-Lb3 takes approx. the same time as Pred, and is 10% faster than SymbExec. Important to note that Fig. 7b does not include the situation that Pred or SymbExec does not compute a solution but Ra-Se-Pred-Lb3 does. For the former questions, these cases happen rarely, for Ra-Se-Pred-Lb3 and SymbExec it occurs for 918 tasks. Ra-Se-Pred-Lb3 needs in median 15 seconds wall time to compute a solution when Pred runs into a timeout and 52 seconds for SymbExec, both would lead to an increase factor smaller than 0.1.

figure j

7 Related Work

Numerous approaches combine different verification techniques. Selective combinations [6, 40, 45, 51, 72, 83, 92] consider certain features of a task to choose the best approach for that task. Nesting approaches [3, 4, 25, 26, 30, 32, 49, 82, 84] use one or more approaches as components in a main approach. Interleaved approaches [1, 2, 5, 10, 42, 50, 55, 58, 62, 68, 75, 78, 90, 97] alternate between different approaches that may or may not exchange information. Testification approaches [28, 29, 39, 43, 52, 74, 81] often sequentially combine a verification and a validation approach and prioritize or only report confirmed proofs and alarms. Sequential portfolio approaches [44, 61] run distinct, independent analyses in sequence while parallel portfolio approaches [91, 12, 63, 65, 66, 96] execute various, independent analyses in parallel. Parallel white-box combinations [7, 9, 37, 38, 54, 56, 59, 79] run different approaches in parallel, which exchange information for the purpose of collaboration. Next, we discuss cooperation approaches that split the search space as we do.

A common strategy for dividing the search space in sequential or interleaved combinations is to restrict the subsequent verifiers to the yet uncovered search space, e.g., not yet covered test goals [12], open proof obligations [67], or yet unexplored program paths [8, 10, 19, 31, 33, 41, 42, 47, 53, 71]. Some parallel combinations like CoDiDroid [80], distributed assertion checking [93], or the compositional tester sketched in conditional testing [12] decompose the verification statically into separate subtasks. Furthermore, some techniques split the search space to run different instances of the same analysis in parallel on different parts of the program. For example, conditional static analysis [85] characterizes paths based on their executed program branches and uses sets of program branches to describe the split. Concurrent bounded model checking techniques [69, 77] split paths based on their thread interleavings. Yan et al. [95] dynamically split the input space if the abstract interpreter returns an inconclusive result and analyses the input partitions separately with the abstract interpreter. To realize parallel test-case generation, Korat [76] considers different input ranges in distinct parallel instances. Parallel symbolic execution approaches [82, 86, 87, 8889, 94] and ranged model checking [48] split execution paths, thereby often partitioning the execution tree. The set of paths are characterized by input constraints [89], path prefixes [87, 88], or ranges [82, 86, 94, 48] and are either created statically from an initial shallow symbolic execution [87, 88, 89] or tests [82, 86, 94] or dynamically based on the already explored symbolic execution tree [27, 34, 82, 86, 98]. While we reuse the idea of splitting the program paths into ranges [82, 86, 94, 48], we generalize the idea of ranged symbolic execution [82, 86, 94] to arbitrary analyses and in particular allow to combine different analyses. Furthermore, we introduce a new static splitting strategy along loop bounds.

8 Conclusion

Ranged symbolic execution scales symbolic execution by having several analysis instances ran on different ranges in parallel. In this paper, we have generalized this idea to arbitrary analyses by introducing and formalizing the notion of a composition of ranged analyses. We have moreover proposed and implemented a novel splitting component based on loop bounds. Our evaluation shows that a composition of ranged analyses can in particular increase the number of solved tasks. It furthermore demonstrates the superiority of the novel splitting strategy. As future work we see the incorporation of information sharing between analysis running in parallel.