Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

In recent years, a number of systems have been proposed to automate the verification of either branching-time properties (e.g. expressed in CTL) or linear-time properties (e.g. LTL) of general integer manipulating programs [3, 8, 1012]. Branching-time property verification requires reasoning about sets of states within a transition system that satisfy a particular temporal formula. Contrarily, linear-time property verification requires reasoning about sets of paths that satisfy a formula. However, these logics have significantly reduced expressiveness as they restrict or disallow the interplay between linear-time and branching-time operators. For example, a property involving the assertion “along some future an event occurs infinitely often" cannot be expressed in either LTL or CTL, yet is crucial when expressing the existence of fair paths spawning from every reachable state in an infinite-state system. Contrarily, CTL \(^*\) is capable of expressing CTL, LTL, and properties necessitating their interplay, as demonstrated by examples further below.

Unfortunately, no fully automatic CTL \(^*\) proving methods for infinite-state systems are known. Despite the existence of automated verification tools for branching-time and linear-time temporal logic, these tools do not allow for the verification of CTL \(^*\). A key problem is that CTL \(^*\) formulae cannot merely be partitioned into isolated CTL and LTL sub-formulae, as such a partition fails to treat the intricate dependence between state-based and path-based reasoning. In this paper we introduce the first known automatic method capable of proving CTL \(^*\) properties of infinite-state programs. Our contribution is a method that allows for the arbitrary nesting of state-based reasoning within path-based reasoning, and vice versa. Towards this purpose we recursively deconstruct a CTL \(^*\) formula in a way that allows us to determine where the subtle interplay between the arbitrary nesting of path and state formulae occurs. To reason about the path sub-formulae, we find a sufficient set of branching nondeterministic decisions within a program’s transition relation. We then devise a method of temporarily substituting said nondeterministic decisions with a partially symbolic determinized form. That is, nondeterministic decisions regarding which paths are taken are determined by variables that summarize the future of the program execution. When interchanging between path and state formulae, these determinized relations must then be collapsed to incorporate path quantifiers. Preconditions for the given CTL \(^*\) property can then be acquired via existing CTL model checkers.

Based on our approach, we have developed a tool capable of automatically proving properties of programs that no tool could previously fully automate. The paper closes with a description of our experimental results using the developed tool on various programs drawn from industrial examples. Our tool is available under the MIT open-source license at https://github.com/hkhlaaf/T2/tree/T2Star.

Expressiveness of \({{\mathbf {\mathsf{{CTL}}}}}^{*}\) . CTL \(^*\) allows us to express properties involving existential system stabilization, stating that an event can eventually become true and stay true from every reachable state. Additionally, it can express “possibility" properties, such as the viability of a system, stating that every reachable state can spawn a fair computation. Below are properties that can only be afforded by the extra expressive power of CTL \(^*\). These liveness properties are often imperative to verifying systems such as Windows kernel APIs that acquire resources and APIs that release resources, as later shown by our experiments.

For example, the property \(\mathsf{E}\mathsf{F}\mathsf{G}(\lnot x \wedge (\mathsf{E}\mathsf{G}\mathsf{F}\; x))\) conveys the divergence of paths. That is, there is a path in which a system stabilizes to \(\lnot x\), but every point on said path has a diverging path in which x holds infinitely often. This property is not expressible in CTL or in LTL, yet is crucial when expressing the existence of fair paths spawning from every reachable state in a system. In CTL, one can only examine sets of states, disallowing us to convey properties regarding paths. In LTL, one cannot approximate a solution by trying to disprove either \(\mathsf{F}\mathsf{G}\;\lnot x\) or \(\mathsf{G}\mathsf{F}\;x\), as one cannot characterize these proofs within a path quantifier.

Another CTL \(^*\) property \(\mathsf{A}\mathsf{G}\big [ (\mathsf{E}\mathsf{G}\;\lnot x ) \vee (\mathsf{E}\mathsf{F}\mathsf{G}\;y)\big ]\) dictates that from every state of a program, there exists either a computation in which x never holds or a computation in which y eventually always holds. The linear time property \(\mathsf{G}(\mathsf{F}x \rightarrow \mathsf{F}\mathsf{G}\;y)\) is significantly stricter as it requires that on every computation either the first disjunct or the second disjunct hold. Finally, the property \(\mathsf{E}\mathsf{F}\mathsf{G}\big [(x \vee (\mathsf{A}\mathsf{F}\;\lnot y))\big ]\) asserts that there exists a computation in which whenever x does not hold, all possible futures of a system lead to the falsification of y. This assertion is impossible to express in LTL.

Related Work. Proof systems for the verification of CTL \(^*\), first introduced by [14, 21], have been well-studied. It is known that CTL \(^*\) model checking for infinite-state systems generalizes termination and co-termination and is undecidable. A decision procedure exploring the structure of finite-state \(\omega \)-automata was first introduced to determine the satisfaction of a CTL \(^*\) formula over binary relations in [17], and later extended in [15]. A complete and sound axiomatization of propositional CTL \(^*\) then followed in [26], which inspired the first sound and relatively complete deductive proof system for the verification of CTL \(^*\) properties over possibly infinite-state reactive systems [20]. Proof rules for verifying CTL \(^*\) properties of infinite-state systems were implemented in STeP [4]. However, the STeP system is only semi-automated, as it still requires users to construct auxiliary assertions and participate in the search for a proof.

Model checking CTL \(^*\)  [16] for finite-state programs and other decidable settings has been implemented in [18]. Their approach reduces a CTL \(^*\) formula to \(\mu \)-calculus using a system of fixed-point equations on relations with first-order quantifiers and equalities. They then invoke a \(\mu \)-calculus model checker. Contrarily, we seek to verify the undecidable general class of infinite-state programs supporting both control-sensitive and integer properties. Given that \(\mu \)-calculus model checking is polynomial-time equivalent to the solution of parity games [15], one can conceive that the approach in [2] could potentially solve CTL \(^*\) model checking if the latter were reduced to solving parity games by combining [18] and [15]. However, we note that the resulting infinite-state game would integrate the (first-order \(\mu \)-calculus) property within the program making it difficult to extract invariants pertaining to the program. For this reason, it is often the case that such a series of reductions inhibits tool performance. Furthermore, [2] requires a manual instantiation of the structure of assertions, characterizing subsets of the infinite-state game, that are to be found by their tool.

Existing automated tools for verification of infinite-state programs support either branching-time only or linear-time only reasoning, e.g., [3, 5, 8, 1012, 27]. The important distinction however is that these tools do not allow for the interaction between linear-time and branching-time formulae.

Finally, we have adopted and repurposed a similar symbolic determinization technique introduced in [12] for the verification of LTL formulae in the infinite-state setting. Their symbolic determinization is based on the counterexample-guided refinement of generated tree counterexamples, or counterexamples with branching paths. That is, [8] produce a semantics-preserving transformation that encodes the structure of the nested CTL formulae within the state space, allowing for the generation of tree counterexamples. This causes precondition generation for syntactically partitioned formulae to be no longer possible, limiting the interplay between linear-time operators and path quantifiers allowed by our strategy.

Limitations. Our tool does not support programs with heap, nor do we support recursion or concurrency. The heap-based programs we consider during our experimental evaluation have been abstracted using an over-approximation technique introduced by  [22]. Effective techniques for proving temporal properties of programs with heap remains an open research question. Our technique relies on the availability of CTL model checking and non-termination procedures. It is, in principle, applicable to every class of infinite-state systems for which such procedures are available (provided that integer variables are allowed). Additionally, our procedure is not complete as we use a series of techniques for safety [24], termination [9, 25], nontermination [19], and CTL  [3, 11] that are not complete. Furthermore, our determinization procedure is not complete. We will further address this issue in later sections.

2 Preliminaries

Programs. As is standard [23], we treat programs as control-flow graphs, where edges are annotated by the updates they perform to variables. A program is a triple \(P=(\mathcal{L},E,\mathsf{Vars })\), where \(\mathcal{L}\) is a set of locations, E is a set of edges/transitions, and \(\mathsf{Vars }\) is a set of variables. Each edge \(\tau =(\ell ,\rho ,\ell ')\) in E, where \(\ell ,\ell '\in \mathcal{L}\) and \(\rho \) is a condition, specifies possible transitions in the program. The condition \(\rho \) is an assertion in terms of \(\mathsf{Vars }\) and \(\mathsf{Vars }'\), a primed copy of \(\mathsf{Vars }\), where constants range over \(\mathsf{Vals }\). That is, \(\mathsf{Vars }\) refers to the values of variables before an update and \(\mathsf{Vars }'\) refers to the values of variables after an update.

The set of locations includes the first location \(\ell _{_I}\), which has no incoming transitions from other program locations. That is, for every \(\tau =(\ell ,\rho ,\ell ')\in E\) we have \(\ell '\ne \ell _{_I}\). Transitions exiting \(\ell _{_I}\) have their conditions expressed in terms of \(\mathsf{Vars }'\). Locations with incoming transitions from \(\ell _{_I}\) are initial locations. This allows us to encode more complex initial conditions. In figures, we omit \(\ell _{_I}\) and merely display the edges to locations with incoming transitions from \(\ell _{_I}\).

A program gives rise to a transition system \(T=(S,R)\), where \(S\) is the set of program states of the form \(S=(\mathcal{L}-\{\ell _{_I}\}) \times (\mathsf{Vars }\rightarrow \mathsf{Vals })\) and \(R\subseteq S\times S\). That is, a program state is a pair \((\ell ,f)\) where \(\ell \ne \ell _{_I}\) and \(f\) is a valuation, i.e., a function from program variables to values. A program can transition from \((\ell ,f_1)\) to \((\ell ',f_2)\) if there exists a transition \((\ell ,\rho ,\ell ')\in E\) such that \((f_1,f_2)\models \rho \). The valuation \((f_1,f_2)\) is a function from \(\mathsf{Vars }\cup \mathsf{Vars }'\) to \(\mathsf{Vals }\) such that for every \(v\in \mathsf{Vars }\), \((f_1,f_2)(v)=f_1(v)\) and \((f_1,f_2)(v')=f_2(v)\). A state \((\ell ,f)\) is considered initial if there is a transition \((\ell _{_I},\rho ,\ell )\) such that \((f_{{-}1},f)\models \rho \), where \(f_{{-}1}\) is some arbitrary valuation. Notice that \(\rho \) is expressed in terms of \(\mathsf{Vars }'\) and hence the valuation \(f_{-1}\) does not affect the satisfaction of \(\rho \).

Given \(V\subseteq \mathsf{Vars }\), the valuation obtained from \(f\) by restricting the valuation to variables in V is denoted by \(f{\Downarrow }_{V}\). The restriction of states of the form \((\ell ,f)\) and paths in the program is defined similarly, e.g., \(\pi {\Downarrow }_{V}\).

Paths. A path or a trace \(\pi \) in P is an infinite sequence of states \((\ell _0,f_0),(\ell _1,f_1),\) \(\ldots \), where for every \(i\ge 0\), there exists some \((\ell _i,\rho _i,\ell _{i+1})\in E\) where \((f_i,f_{i+1})\models \rho _i\). We say that \(\pi \) is an \((\ell ,f)\)-path if \(\ell _0=\ell \) and \(f_0 = f\). Given a program \(P\), a location \(\ell \), and a valuation \(f\), we denote the set of \((\ell ,f)\)-paths in \(P\) by \(\mathsf{Path }(P,\ell ,f)\). We say that \(\pi \) is a computation in \(P\) if \((\ell ,f)\) is initial. Note that we restrict our attention to infinite paths and computations. In practice, we modify programs, transition systems, and temporal logic formulae to ensure that all paths are infinite, as is done, e.g., in [6].

CTL \({}^{*}\mathbf{.}\) We are interested in verifying full computation tree logic (CTL \(^*\)) [14, 21]. The syntax of CTL \(^*\) (written in negation normal form) includes state formulae \(\varphi \), that are interpreted over states, and path formulae \(\psi \), that are interpreted over paths. We assume that atomic propositions (ranged over by \(\alpha \)) are expressed in some underlying theory over variables and constants (e.g. \(\mathsf x < \mathsf y \)). State formulas (\(\varphi \)) and path formulas (\(\psi \)) are co-defined:

For a program \(P\) and a CTL \(^*\) state formula \(\varphi \), we say that \(\varphi \) holds at a state s in \(P\), denoted by \(P,s \models \varphi \) if:

  • If \(\varphi =\alpha \), then \(P,s \models \alpha \) iff \(s \models \alpha \)

  • If \(\varphi =\lnot \alpha \), then \(P,s \models \lnot \alpha \) iff \(s \not \models \alpha \)

  • If \(\varphi =\varphi _1\vee \varphi _2\), then \(P,s \models \varphi _1\vee \varphi _2\) iff \( s \models \varphi _1\) or \( s \models \varphi _2\)

  • If \(\varphi =\varphi _1\wedge \varphi _2\), then \(P,s \models \varphi _1\wedge \varphi _2\) iff \( s \models \varphi _1\) and \( s \models \varphi _2\)

  • If \(\varphi =\mathsf{A}\psi \), then \(P,s \models \mathsf{A}\psi \) iff \(\forall \pi = (s,...).\; P,\pi \models \psi \)

  • If \(\varphi =\mathsf{E}\psi \), then \(P,s \models \mathsf{E}\psi \) iff \( \exists \pi = (s,...).{\;} P,\pi \models \psi \)

Path formulae are interpreted over paths. For a program \(P\) and a CTL \(^*\) path formula \(\psi \), we say that \(\psi \) holds on a path \(\pi = (s_{0},s_{1},\ldots )\) in \({P}\) for location i, denoted by \(P,\pi ,i \models \psi \) if:

  • If \(\psi =\varphi \) is a state formula, then \(P,\pi ,i \models \varphi \) iff \(P,{s}_i\models \varphi \).

  • If \(\psi =\psi _1\vee \psi _2\), then \(P,\pi ,i \models \psi _1 \vee \psi _2\) iff \(P,\pi ,i\models \psi _1\) or \(P,\pi ,i\models \psi _2\)

  • If \(\psi =\psi _1\wedge \psi _2\), then \(P,\pi ,i \models \psi _1 \wedge \psi _2\) iff \(P,\pi ,i\models \psi _1\) and \(P,\pi ,i\models \psi _2\)

  • If \(\psi = \mathsf{F}\psi _1\), then \(P,\pi ,i \models \mathsf{F}\psi _1\) iff \( \exists j \ge i.\; P,\pi ,j\models \psi _1\)

  • If \(\psi =\mathsf{G}\psi _1\), then \(P,\pi ,i \models \mathsf{G}\psi _1\) iff \(\forall j \ge i.\; P,\pi ,j\models \psi _1\)

  • If \(\psi =\psi _1\mathsf W \psi _2\), then \(P,\pi ,i \models \psi _1 \mathsf W \psi _2\) iff either \(\exists k\ge i.\; P,\pi ,k\models \psi _2\) and \(\forall i\le j<k.\; P,\pi ,j\models \psi _1\) or \(\forall j \ge i.\; P,\pi ,j\models \psi _1\)

  • If \(\psi =\psi _1 \mathsf {U}\psi _2\), then \(P,\pi ,i \models \psi _1 \mathsf {U} \psi _2\) iff \(\exists k \ge i.\;P,\pi ,k \models \psi _2\) and \(\forall i\le j < k.\; P,\pi ,j \models \psi _1\)

A path formula \(\psi \) holds in a path \(\pi \), denoted by \(P,\pi \models \psi \), if \(P,\pi ,0\models \psi \). For a state formula \(\varphi \), \(\varphi \) holds on \(P\), denoted by \(P\models \varphi \), if for every initial state s we have \(P,s \models \varphi \). When the program P is is clear from the context, we may write \(s\models \varphi \) for a state formula \(\varphi \) or \(\pi ,i\models \psi \) for a path formula \(\psi \).

The branching-time logic CTL is a restricted subset of CTL \(^*\) in which temporal operators cannot be nested. That is, the only path formulas allowed are \(\mathsf {G}\varphi _1\), \(\mathsf {F}\varphi _1\), \(\varphi _1{\mathsf {U}}\varphi _2\), and \(\varphi _1{\mathsf {W}}\varphi _2\) for state formulas \(\varphi _1\) and \(\varphi _2\). The linear-time logic LTL is a fragment of CTL \(^*\) that only allows formulae of the form \(\mathsf{A}\psi \), where \(\mathsf{A}\) is the only occurrence of a path quantifier within \(\psi \). When taking LTL as subset of CTL \(^*\), LTL formulae are implicitly prefixed with the universal path quantifier A.

Strongly Connected Subgraphs. We provide some notation regarding strongly-connected subgraphs followed by the definition of relation pairs below. For a program \(P\), we denote an ordered sequence of locations \(\ell _0,...,\ell _n\) as a cycle c if \(\ell _n=\ell _0\) and for every \(i\ge 0\) there exists some \((\ell _i,\rho _i,\ell _{i+1})\in E\). Let C be the set of program locations such that \(\ell \in \mathcal{L}\) appears in a cycle c. That is, \(C=\{\ell ~|~ \exists c.\; \ell \in c \}\). For a program \(P\) and the set of locations C, we identify \({\textsc {SCS}}(P,C)\) as some maximal set of non-trivial strongly-connected subgraphs (SCSs) of \(P\) such that every two subgraphs \(G_1,G_2\in {\textsc {SCS}}(P,C)\) are either disjoint or one is contained in the other and for every \(\ell \in C\), there exists at least one \(G\in {\textsc {SCS}}(P,C)\) such that \(\ell \in G\). The details regarding the identification of C and \({\textsc {SCS}}(P,C)\) are standard and thus omitted here (see, e.g., [13]). We denote the minimal SCS in \({\textsc {SCS}}(P,C)\) that contains a location \(\ell \in \mathcal{L}\) by \({\textsc {MinSCS}}(P,C,\ell )\).

Identifying a program’s strongly-connected subgraphs allows us to sufficiently find the set of relation pairs that characterize instances of branching nondeterministic decisions within a program’s transition relation. A relation pair is thus \((\rho _1,\rho _2)\) such that for some location \(\ell \) we have \((\ell ,\rho _1,\ell _1)\) and \((\ell ,\rho _2,\ell _2)\) are transitions of \(P\) and \(\ell _1\in {\textsc {MinSCS}}(P,C,\ell )\) and \(\ell _2\notin {\textsc {MinSCS}}(P,C,\ell )\). That is, \(\rho _1\) is the condition for remaining in the (minimal) SCS of \(\ell \) and \(\rho _2\) is the condition for leaving the (minimal) SCS of \(\ell \).

3 Overview

In this section, we present a quick overview of our CTL \(^*\) verification procedure ProveCTL \(^*\), presented in Fig. 3 with an in-depth explanation provided later in Sect. 4. The procedure is designed to recurse over the structure of a given CTL \(^*\) formula, and for each sub-formula \(\theta \) we produce a precondition a that ensures its satisfaction. That is, a is an assertion over program variables and locations characterizing the states of the program that satisfy \(\theta \). We start by finding the precondition of the innermost sub-formula, followed by searching for the preconditions of the outer sub-formulae dependent on it.

A given CTL \(^*\) formula is deconstructed to differentiate between state and path sub-formulae, as the crux of verifying CTL \(^*\) formulae lies within identifying the interplay between the arbitrary nesting of path and state formulae. Preconditions for branching-time logic state formulae can be acquired via existing CTL model checking techniques which return an assertion characterizing the states in which a sub-formula holds. The essence of our algorithm is thus within how we acquire sufficient preconditions for path formulae that admit a sound interaction with state formulae. The algorithm is based on the procedures below, which are defined in later sections of the paper:

Approximate is a procedure that performs a syntactic conversion from a path formula to its corresponding over-approximated universal CTL formula (ACTL)Footnote 1. The over-approximated formula can then be checked by an existing CTL model checker over a partially symbolic determinized form of the program to reduce path formula verification to state formula verification.

Determinize allows us to reason about path characterization through state characterization, as the satisfaction of an ACTL over-approximated formula implies the satisfaction of the path formula. However, the inverse does not hold. The procedure thus constructs a form of a partially determinized program over the symbolic representations of all characterized instances of branching nondeterminism (i.e. relation pairs), stemming from the same program location \(\ell \). That is, nondeterministic decisions regarding which paths are taken would be determined by prophecy variables, which determine future outcomes of the program execution, and their values [1]. Recall that relation pairs are distinguished if they are not part of the same strongly connected subgraph.

QuantElim acquires the proper set of states that satisfy a formula which has been verified over a determinized program. This allows for the path quantification present within a CTL \(^*\) formula, that is, whether all paths (or some paths) starting from a state satisfy a path formula. When a CTL \(^*\) formula of the form \(\theta \,\,{:}{:=}\,\mathsf{A}\psi \mid \mathsf{E}\psi \) is reached after acquiring a set of states satisfying \(\psi \), \(\theta \) is verified on the same determinized program used for \(\psi \). We then must use quantifier elimination to acquire the proper set of states that satisfy \(\theta \), thus quantifying the assertions over the values of the prophecy variables. If the formula is of the form \(\mathsf{A}\psi \), we universally quantify the prophecy variables appearing in the set of states that satisfy \(\mathsf{A}\psi \). If the formula is of the form \(\mathsf{E}\psi \), we existentially quantify the prophecy variables.

Fig. 1.
figure 1

(a) The control-flow graph of a program for which we wish to prove the CTL \(^*\) property \(\mathsf{E}\mathsf{F}\mathsf{G}\; x = 1\). (b) The control-flow graph after calling Determinize, it includes the prophecy variable \(n_{\ell _1}\) corresponding to the nondeterministic relation pair \((\rho _2,\rho _3)\).

Example. Consider the program in Fig. 1(a) and the property \(\mathsf{E}\mathsf{F}\mathsf{G}\;x = 1\) stating that there exists a possible future where \(x = 1\) will eventually become true and stay true. This is a system stabilization property which can only be expressed in CTL \(^*\). We begin by identifying that \(\mathsf{G}\;x = 1\) is a path formula, and thus use Approximate to return the over-approximated state formula \(\mathsf{A}\mathsf{G}\;x = 1\). We then initiate a CTL model checking task where we seek a set of states \(a_\mathsf{G}\) such that \(\mathsf{E}\mathsf{F}a_\mathsf{G}\) holds, and for every state s such that \(s\models a_\mathsf{G}\) we have \(s\models \mathsf{A}\mathsf{G}\;x = 1\).

Our formula would now only be valid if we can find a set of states that are eventually reached in a possible future from the program’s initial states such that \(\mathsf{A}\mathsf{G}\;x = 1\) holds. However, no such set of states exists as the nondeterministic choice from \(\ell _1\) to \(\rho _2\) and \(\rho _3\) does not allow us to determine if we will eventually leave the loop or not. That is, there exists no set of states which can exemplify the infinite branching possibilities of leaving \(\rho _2\) to possibly reaching \(\rho _3\) or remaining in \(\rho _2\) forever. In order to reason about the original sub-formula \(\mathsf{G}\;x = 1\), we must be observing sets of paths, not states. Given that we over-approximated our formula in a way that allows us to only reason about states, we thus symbolically determinize the program to simultaneously simulate all possible related paths through the control flow graph and try to separate them to originate from distinct states in the program.

Our procedure Determinize would then return a new partially symbolically determinized system in which a newly introduced prophecy variable, named \(n_{\ell _{1}}\) in Fig. 1(b), is associated with the relation pair \(({\rho }_{2},{\rho }_{3})\), and is used to make predictions about the occurrences of relations \({\rho }_{2}\) and \({\rho }_{3}\). Recall that relation pairs correspond to pairs of nondeterministic transitions, one remaining in a SCS and the other leaving the same SCS. In this case, \({\rho }_{3}\) is indeed disjoint from the strongly connected subgraph of \({\ell }_1\).

Given that we initialize \(n_{{\ell }_{1}}\) to a nondeterministic value, for every path in the program, a positive concrete number chosen at the nondeterministic assignment predicts the number of instances that transition \({\rho }_2\) is visited before transitioning to \(\rho _3\). That is, we remain in \({\rho }_2\) until \(n_{{\ell }_{1}} = 0\), with \(n_{\ell _{1}}\) being decremented at each passage through the loop. Once we terminate the loop, the prophecy variable is nondeterministically reset (for the case that we return to the same loop again). A negative assignment to \(n_{\ell _{1}}\) denotes remaining in \(\rho _2\) forever, or non-termination.

We can now utilize an existing CTL model-checker to return an assertion characterizing the states in which \(\mathsf{G}\;x = 1\) holds by verifying the determinized program, denoted by \(P_D\), using the over-approximated CTL formula \({\mathsf {AG}} {\;}x = 1\). The assertion \(a_\mathsf{G}=(\ell _1 \wedge n_{\ell _1}<0)\) is returned, and we proceed by replacing the sub-formula with its assertion in the original CTL \(^*\) formula, resulting in \(\mathsf{E}\mathsf{F}a_\mathsf{G}\). To verify the outermost CTL \(^*\) formula, \({\mathsf {EF}} \), note that syntactically this is a readily acceptable CTL formula. However, we cannot simply use a CTL model checker as the path quantifier \(\mathsf{E}\) exists within a larger relation context reasoning about paths given the inner formula F G. We thus must use the CTL model-checker to verify \({\mathsf {EF}} a_{\mathsf {G}}\) over the same determinized program previously generated.

Our procedure returns with the same precondition \((\ell _1 \wedge n_{\ell _1}<0)\). We then use quantifier elimination to existentially quantify out all introduced prophecy variables. The existential quantification corresponds to searching for some path (or paths) that satisfy the path formula. Thus, if there is a state s in the original program, and some value of the prophecy variables v such that all paths from the combined state \((s,n_{\ell _1}=v)\) in \(P_D\) satisfy the path formula then clearly, these paths give us a sufficient proof to conclude that \(\mathsf{E}\mathsf{F}\mathsf{G}\;x=1\) holds from s in P.

4 Checking CTL \(^*\) Formulae

In this section, we describe the details of our CTL \(^*\) model checking procedure ProveCTL \({}^{*}\). We first define the procedures utilized by ProveCTL \({}^{*}\), namely Determinize and \({\textsc {Approximate}}\), followed by our model checking procedure and its utilization of QuantElim.

Fig. 2.
figure 2

(a) Determinize identifies relation pairs and constructs a symbolically determinized program over them. (b) Approximate produces a syntactic conversion from a path formula to its corresponding over-approximation in A CTL. (c) Verify wraps \({{\textsc {ProveCTL}^*}}\) and then checks all initial states. (d) QuantElim applies quantifier elimination in order to convert path characterization to state characterization restricting attention to states from which an infinite path exists.

Fig. 3.
figure 3

Our recursive CTL \(^*\) verification procedure employs an existing CTL model checker and uses our procedures Approximate and QuantElim. It expects a CTL \(^*\) property \(\theta \), a program \(P\), and its determinized version \(P_D\) as parameters. An assertion characterizing the states in which \(\theta \) holds is returned along with a boolean value indicating whether the formula checked was a path formula (and hence approximated).

Determinize. The procedure Determinize constructs a form of partially symbolically determinized program over relation pairs that characterize instances of branching nondeterminism. We present our procedure in Fig. 2(a), where a program \(P\) is given and a partially determinized program \(P_D\), contingent upon nondeterministic relation pairs, is returned. Ultimately, Determinize is designed to allow proof tools for branching-time logic state formulae to be used to reason about path formulae.

We begin by finding a sufficient set of relation pairs to symbolically determinize the program to one which has the same set of paths as the original. These relations are distinguished if there exist two nondeterministic relations stemming from the same location and yet are not part of the same strongly-connected subgraph. Our procedure thus begins by iterating over the set of a program’s edges, \((\ell ,\rho ,\ell ') \in E\) on line 6. We identify whether or not \(\ell \in C\) given that \(G = {\textsc {MinSCS}}(P,C,\ell )\) and \(G \ne \emptyset \) on lines 7 and 8. If from some location \(\ell \), where \(G = {\textsc {MinSCS}}(P,C,\ell )\), there is an edge to \(\ell '\) such that \({\textsc {MinSCS}}(P, C, \ell ')\) is not equivalent to G, we can conclude that the transition from \(\ell \) to \(\ell '\) leaves the SCS of \(\ell \). We only desire that \(\ell \) and \(\ell '\) be elements of the most minimal SCS as such an edge eludes to the nondeterministic decision point where a transition diverted from remaining within an SCS. This nondeterministic point is key to the identification of where determinization must occur to facilitate the application of state-based reasoning to path-based reasoning for given a program P.

If the strongly connected subgraphs of \(\ell \) and \(\ell '\) do differ, we add \(\ell \) to \({\textsc {Synth}}\), a list which tracks locations with nondeterministic points. For every such location, we identify a relation pair corresponding to the decision of either remaining in the same SCS, or leaving it. After finding all possible elements of \({\textsc {Synth}}\), on line 11 we iterate over the program edges, and for each relation pair encountered we introduce a new prophecy variable to predict the future outcome of the decision. Indeed, our motivation is to identify nondeterministic points so we can symbolically simulate all possible branching paths through a program, yet decisions regarding which paths are taken are determined by prophecy variables and their values. Information regarding different paths is now stored in the state of the modified program. This allows for a correspondence such that the verification path formulae can be reduced to the verification of A CTL formulae.

When an edge \((\ell ,\rho ,\ell ') \in E\) is reached containing \(\ell \in {\textsc {Synth}}\), a prophecy variable \(n_\ell \in \mathbb {Z}\) is added to the set of program variables \(\mathsf{Vars }\) at line 13. If \(\ell '\) is contained within \({\textsc {MinSCS}}(P,C,\ell )\), we constrain \(\rho \) by requiring that \(n_\ell \ne 0\), and then decrement \(n_\ell \). If \(\ell '\) is not contained within \({\textsc {MinSCS}}(P,C,\ell )\), we constrain \(\rho \) by \(n_\ell = 0\), and \(n_\ell '\) remains unconstrained, entailing a reset to a nondeterministic integer. The nondeterministic decision of the number of times a cycle is passed through is thus now determined by the prophecy variable \(n_\ell \). In the case that \(n_\ell < 0\), this rule corresponds to behaviors where every visit to \(\ell \) is followed by a successor in the same SCS (i.e., the computation always remains in the SCS of \(\ell \)). The nondeterminism within a transition relation is thus either determined at initialization by the initial choice of values for \(n_\ell \) or else later in a path by choosing new nondeterministic values for \(n_\ell \).

We show that the determinization maintains the set of paths in the original program and the prophecy variables introduced merely trade nondeterminism in the transition relation for a larger, nondeterministic state space.

Theorem 1

For every path \(\pi \) in P there is a path \(\pi '\) in \(P_D\) such that \(\pi '{\Downarrow }_{\mathsf{Vars }}=\pi \). Furthermore, for every path \(\pi '\) in \(P_D\) it holds that \(\pi '{\Downarrow }_{\mathsf{Vars }}\) is a path in P.

Proof

See TR [7], Appendix A.

Approximate. In Fig. 2(b), we present a syntactic conversion from pure linear-time formulae in CTL \(^*\), that is LTL, to a corresponding over-approximation in A CTL. Our procedure is given a path formula \(\psi \) and two atomic preconditions, \(a_{\theta '_1}\) and \(a_{\theta '_2}\), corresponding to satisfaction of the nested CTL \(^*\) formulae which appear within \(\psi \). The precondition \(a_{\theta '_2}\) is a conditional parameter utilized only when LTL formulae requiring two properties (e.g. W, U, \(\wedge \), \(\vee \)) are given. Due to the recursive nature of ProveCTL \({}^{*}\), presented in the next section, these preconditions would have already been priorly generated.

On lines 3–7, we instrument a universal path quantifier A  preceding the appropriate temporal operators. Not only so, but the sub-formulae \(\theta '_1\) and \(\theta '_2\) are replaced with their corresponding preconditions \(a_{\theta '_1}\) and \(a_{\theta '_2}\), respectively. This aligns with how ProveCTL \({}^{*}\) will recursively iterate over each inner sub-formula followed by search for the preconditions of the outer sub-formulae dependent on it. Replacing a path formula by its CTL approximation indeed is sound in the sense that if the modified formula holds then the original holds as well.

Theorem 2

For every program P, a state \((\ell ,f)\), and a path formula \(\psi \), if \(P,(\ell ,f)\models {{\textsc {Approximate}} {(}{\psi }{{)}}}\) then \(P,(\ell ,f) \models \mathsf{A}\psi \).

Proof

See TR [7], Appendix A.

Theorem 2 does not consider existential path quantification. Recall that in order to conclude that the CTL \(^*\) formula \(P,s\models \mathsf{E}\psi \) for some path formula \(\psi \), we require that there is some value v of the prophecy variables such that \(P_D,(s,v)\models \mathsf{A}\psi \). This means that when restricting attention to a certain set of paths that start in a state s (those that match the valuation v for prophecy variables), all paths in the set satisfy the formula \(\psi \). Clearly, this satisfies the requirement that there is some path that satisfies the formula.

4.1 ProveCTL*

In this section, we present our main CTL \(^*\) verification procedure. Fig. 2(c) depicts Verify, which wraps the main procedure \({{\textsc {ProveCTL}^*}}\), shown in Fig. 3. We then generate a determinized copy of the program, \(P_D\), using the aforementioned procedure Determinize. This program is then passed into \({\textsc {ProveCTL}^*} \) along with the original program \(P\) and a CTL \(^*\) property \(\theta \). \({\textsc {ProveCTL}^*} \) then returns an assertion a, characterizing the states in which \(\theta \) holds. The second argument returned is disregarded, indicated by “_", as it is only used within the recursive calls of \({\textsc {ProveCTL}^*} \). When \({\textsc {ProveCTL}^*} \) returns to Verify, it is only necessary to check if the precondition a is satisfied by the initial states of the program.

In order to synthesize a precondition for a CTL \(^*\) property \(\theta \), we first recursively accumulate the preconditions generated when considering the sub-formulae of \(\theta \) at lines 9, 10, 12, 25, 26, and 28. That is, for each sub-formula \(\theta \), we produce a precondition \(a_\theta \) that ensures its satisfaction. We note that the precondition of an atomic proposition \(\alpha \) is the proposition itself. A given CTL \(^*\) formula is then deconstructed to differentiate between state and path sub-formulae, as the crux of verifying CTL \(^*\) formulae lies within identifying the interplay between the arbitrary nesting of path and state formulae. On line 3, if \(\theta \) can be identified as a state formula \(\varphi \), we carry out the set of actions on lines 4 – 21. If \(\theta \) is identified as a path formula \(\psi \), we then we carry set of actions on lines 22 – 31.

Verifying Path Formulae. When a path formula \(\psi \) is reached, we begin by over-approximating the path formula by syntactically converting it to the universal subset of branching-time logic (ACTL) using the procedure Approximate. Recall that the preconditions generated when considering the sub-formula(e) of \(\psi \) at lines 25, 26, and 28 will be utilized by Approximate to replace \(\theta '_1\) and \(\theta '_2\) with their corresponding preconditions \(a_{\theta '_1}\) and \(a_{\theta '_2}\), respectively. On line 29, Approximate would then return a corresponding state formula \(\psi '\) where a universal path quantifier precedes every temporal operator within \(\psi \).

A precondition for the newly attained ACTL formula \(\psi '\) can now be acquired via existing CTL model checkers which return an assertion characterizing the states in which \(\psi '\) holds. Existing tools which support this functionality include [3] and [11]. In our tool prototype, we build upon the latter. Recall that a precondition for a path formula requires more than a precondition for the corresponding state formula, as \(\psi '\) is merely an over-approximation. We thus must utilize the provided determinized program \(P_D\) when employing a CTL model checker rather than the original program P, as shown on line 30. The assertion \({a_{\theta }}\) is then returned characterizing the sets of states in which \(\theta \) holds.

Recall that \(P_D\) leads to better correspondence between \(\psi \) and \(\psi '\). That is, we find a sufficient set of relation pairs which determinize the program to one which has the same set of paths as the original, yet decisions regarding which paths are taken are determined by introduced prophecy variables and their values, allowing us to reduce path-based reasoning to state-based reasoning.

Finally, on line 31, we set the boolean flag \({\textsc {Path}}\) to true. This flag is the second argument to be returned by \({\textsc {ProveCTL}^*} \). It indicates to the caller that the result \(a_\theta \) returned by the recursive call is approximated. The value of \({\textsc {Path}}\) is used for deciding whether to use \(a_\theta \) as is or modify it (in the case that the verified sub-formula is a state or a path formula, respectively), admitting a sound interaction between state and path formulae.

Verifying State Formulae. In the case that a state formula \(\varphi \) is reached, we partition the state sub-formulae by the syntax of CTL as shown on lines 6 – 8 and 11. This allows us to not only utilize existing CTL model checkers, but to also eliminate the redundant verification of a temporal operator, when it is already be preceded by a path quantifier. As a side effect of partitioning \(\varphi \) in such a way, a path formula \(\psi \) will always be in the form of a pure linear-time path formula, that is, LTL. This particular deconstruction of a CTL \(^*\) formula is what allows us to identify the intricate interplay between path and state formulae.

We begin by recursively generating preconditions when considering the sub-formula(e) of \(\varphi \) at lines 9, 10, and 12. These preconditions will then be utilized by the procedure Replace on line 15. Replace substitutes \(\theta '_1\) and \(\theta '_2\) with their corresponding preconditions \(a_{\theta '_1}\) and \(a_{\theta '_2}\), respectively, and returns a new state formula \(\varphi '\). Preconditions for branching-time logic state formulae can be acquired via existing CTL model checkers. However, in order to allow for the path quantification present within a CTL \(^*\) formula to range over path formulae, we must consider whether all or some paths starting from a particular state satisfy a path formula. This is required in the case that the immediate inner sub-formula is a pure linear-time path formula, which is identified by the aforementioned boolean flag Path given the partitioning of \(\theta \). The role of Path is to track if a sub-formula of the current formula is a path formula. That is, Path indicates that the path quantifier exists within the context of verifying a path formula, and not a branching-time state formula. Thus, it must be verified using \(P_D\), yet the set of states of \(P_D\) that characterize it actually represents a set of paths. This set of paths must be collapsed later to a characterization of the set of states of \(P\) where the (state) formula holds. This is the key to allowing the interplay between state and path formulae.

The procedure QuantElim, presented in Fig. 2(d), which converts path characterization to state characterization, is thus executed at line 17. QuantElim takes in the assertion a returned from calling a CTL model checker on the determinized program \(P_D\) and the partitioned CTL formula \(\varphi '\), as well as the original formula \(\varphi \). We then quantify the assertions over the values of the prophecy variables. If \(\varphi \) is a universal CTL formula, we universally quantify the prophecy variables appearing in the set of states that satisfy \(\varphi \) on line 4 in Fig. 2(d). If \(\varphi \) is an existential CTL formula, we existentially quantify the prophecy variables on line 5. Predictions of the prophecy variables may lead to finite paths to appear in the program, thus quantification must be restricted to states for which there does exist a prophecy value leading to infinite paths. Hence, on line 2 we acquire the precondition \(a_{\mathsf{E}\mathsf{G}}\) satisfying the CTL formula entailing nontermination, that is E G True for \(P_D\). The precondition \(a_{\mathsf{E}\mathsf{G}}\) is then conjuncted with a to ensure that the quantification of prophecy variables does not include finite paths generated due to invalid predictions of the prophecy variables. This is done according to the polarity of the quantification (universal or existential). The assertion \(a_\theta \) is then returned by \({\textsc {QuantElim}}\) characterizing the set of states in which \(\theta \) holds.

In the case that Path is false, the most immediate inner sub-formula would then be a state formula. This indicates that we can indeed use a CTL model checker using \(\varphi '\) and the original program \(P\), as demonstrated on line 20. Upon the return of \({\textsc {ProveCTL}^*} \) to its caller Verify, \(a_\theta \) will contain the precondition for the most outer temporal property of the original CTL \(^*\) formula \(\theta \). Now it is only necessary to check if the precondition \(a_\theta \) is satisfied by the initial states of the program to complete the verification of our CTL \(^*\) formula. Finally, Path is set to false, in order to carry out the above procedure again when necessary.

Theorem 3

If \({\textsc {Verify}}(\theta ,P)\) returns true then \(P\models \theta \).

Proof

See TR [7], Appendix A.

We note that the implication in Theorem 3 is only in one direction. That is, failing to prove that a property holds does not implicate that its negation holds (though this might be proved by negating the formula, converting it to negation normal form, and running our procedure on it). This incompleteness stems from the over-approximation of path formulae by a corresponding A CTL formulae, as although this over-approximation is checked over \(P_D\), \(P_D\) does not determinize all paths. It is impossible to completely determinize a program as this requires uncountable branching (in the choice of prophecy variables). Countable nondeterminism is not a sufficient technique in the context of nondeterministic nested determinization of programs. For example, suppose that the prophecy variable value entails that an external loop does not terminate. Now consider all possible options for number of repetitions of the internal loop. In order to have a completely deterministic program, we must prophesize an infinite sequence of finite natural numbers. The number of such possible infinite sequences is uncountable.

5 Evaluation

In this section we discuss the results of our experiments with an implementation of the procedure from Fig. 2(c). Our implementationFootnote 2 is built as an extension to the open source project T2, which uses a safety prover similar to Impact [24] alongside previously published techniques for discovering ranking functions, etc. [9, 25] to prove both liveness and safety properties. The tool was executed on an Intel x64-based 2.8 GHz single-core processor. The format in which we interpret and parse a program’s commands can be found in [11].

Fig. 4.
figure 4

Experimental evaluations of infinite-state programs drawn from the Windows OS, PgSQL, and 8 toy examples. There are no competing tools available for comparison.

We have drawn out a set of CTL \(^*\) problems from industrial code bases. Examples were taken from the I/O subsystems of the Windows OS kernel, the back-end infrastructure of the PostgreSQL database server, and the Apache web server. CTL \(^*\) allows us to express “possibility" properties, such as the viability of a system, stating that any reachable state can spawn a fair computation. Additionally, we demonstrate that we can now verify properties involving existential system stabilization, stating that an event can eventually become true and stay true from any reachable state. For example, “OS frag. 1", “OS frag. 3", “PgSQL arch 1", and “Bench 2" are verified using said properties, described in detail in Sect. 1. We also include a few toy examples to further demonstrate further expressiveness of CTL \(^*\) and its usefulness in verifying programs.

Given that our benchmarks tackle infinite-state programs, the only existing automated tool for verifying CTL \(^*\) in the finite-state setting [18] is not applicable. In Fig. 4 we display the results of our benchmarks. For each program and its corresponding CTL \(^*\) property to be verified, we display the number of lines of code (LoC), and report the time it took to verify a CTL \(^*\) property (Time column) in seconds. We provide a “Res.” column which indicates the results of our tool. A \(\checkmark \) indicates that the tool was able to verify the property. Likewise, an \(\times \) indicates that the tool failed to prove the property. The symbol “–” in the result column indicates that a result was not determined due to a timeout. A timeout or memory exception is indicated by T/O. A timeout is triggered if verification of an experiment exceeds 3000 seconds. Note that in various cases, we verify the same program using a CTL \(^*\) property and its negation. Our tool thus allows us to prove each of the properties as well as disprove each of their negations.

Our experiments demonstrate the practical viability of our approach. Our runtimes show that our tool runs well within the range of performance previously exhibited by specialized tools such as as [3, 8, 1012], which can only verify significantly less expressive properties over infinite-state programs. Our tool has successfully both verified and invalidated CTL \(^*\) properties corresponding to their expected results for all but one of the benchmarks. This is due to the aforementioned limitation, that is, our countable nondeterministic determinization technique is not complete.

6 Concluding Remarks

We have introduced the first-known fully automatic method capable of proving CTL \(^*\) of infinite-state (integer) programs. This allows us, for the first time ever, to automatically verify properties of programs that mix branching-time and linear-time temporal operators. We have developed an implementation capable of automatically proving properties of programs that no tool could previously prove. The method underlying our tool is one that uses a symbolic representation capable of facilitating reasoning about the interaction between sets of states and sets of paths.