Abstract
The secure information flow problem, which checks whether lowsecurity outputs of a program are influenced by highsecurity inputs, has many applications in verifying security properties in programs. In this paper we present lazy selfcomposition, an approach for verifying secure information flow. It is based on selfcomposition, where two copies of a program are created on which a safety property is checked. However, rather than an eager duplication of the given program, it uses duplication lazily to reduce the cost of verification. This lazy selfcomposition is guided by an interplay between symbolic taint analysis on an abstract (single copy) model and safety verification on a refined (two copy) model. We propose two verification methods based on lazy selfcomposition. The first is a CEGARstyle procedure, where the abstract model associated with taint analysis is refined, on demand, by using a model generated by lazy selfcomposition. The second is a method based on bounded model checking, where taint queries are generated dynamically during program unrolling to guide lazy selfcomposition and to conclude an adequate bound for correctness. We have implemented these methods on top of the SeaHorn verification platform and our evaluations show the effectiveness of lazy selfcomposition.
This work was supported in part by NSF Grant 1525936.
You have full access to this open access chapter, Download conference paper PDF
Similar content being viewed by others
1 Introduction
Many security properties can be cast as the problem of verifying secure information flow. A standard approach to verifying secure information flow is to reduce it to a safety verification problem on a “selfcomposition” of the program, i.e., two “copies” of the program are created [5] and analyzed. For example, to check for information leaks or noninterference [17], lowsecurity (public) inputs are initialized to identical values in the two copies of the program, while highsecurity (confidential) inputs are unconstrained and can take different values. The safety check ensures that in all executions of the twocopy program, the values of the lowsecurity (public) outputs are identical, i.e., there is no information leak from confidential inputs to public outputs. The selfcomposition approach is useful for checking general hyperproperties [11], and has been used in other applications, such as verifying constanttime code for security [1] and ksafety properties of functions like injectivity and monotonicity [32].
Although the selfcomposition reduction is sound and complete, it is challenging in practice to check safety properties on two copies of a program. There have been many efforts to reduce the cost of verification on selfcomposed programs, e.g., by use of typebased analysis [33], constructing product programs with aligned fragments [4], lockstep execution of loops [32], transforming Horn clause rules [14, 24], etc. The underlying theme in these efforts is to make it easier to derive relational invariants between the two copies, e.g., by keeping corresponding variables in the two copies near each other.
In this paper, we aim to improve the selfcomposition approach by making it lazier in contrast to eager duplication into two copies of a program. Specifically, we use symbolic taint analysis to track flow of information from highsecurity inputs to other program variables. (This is similar to dynamic taint analysis [30], but covers all possible inputs due to static verification.) This analysis works on an abstract model of a single copy of the program and employs standard model checking techniques for achieving precision and path sensitivity. When this abstraction shows a counterexample, we refine it using ondemand duplication of relevant parts of the program. Thus, our lazy selfcomposition^{Footnote 1} approach is guided by an interplay between symbolic taint analysis on an abstract (single copy) model and safety verification on a refined (two copy) model.
We describe two distinct verification methods based on lazy selfcomposition. The first is an iterative procedure for unbounded verification based on counterexample guided abstraction refinement (CEGAR) [9]. Here, the taint analysis provides a sound overapproximation for secure information flow, i.e., if a lowsecurity output is proved to be untainted, then it is guaranteed to not leak any information. However, even a pathsensitive taint analysis can sometimes lead to “false alarms”, i.e., a lowsecurity output is tainted, but its value is unaffected by highsecurity inputs. For example, this can occur when a branch depends on a tainted variable, but the same (semantic, and not necessarily syntactic) value is assigned to a lowsecurity output on both branches. Such false alarms for security due to taint analysis are then refined by lazily duplicating relevant parts of a program, and performing a safety check on the composed twocopy program. Furthermore, we use relational invariants derived on the latter to strengthen the abstraction within the iterative procedure.
Our second method also takes a similar abstractionrefinement view, but in the framework of bounded model checking (BMC) [6]. Here, we dynamically generate taint queries (in the abstract single copy model) during program unrolling, and use their result to simplify the duplication for selfcomposition (in the two copy model). Specifically, the second copy duplicates the statements (update logic) only if the taint query shows that the updated variable is possibly tainted. Furthermore, we propose a specialized early termination check for the BMCbased method. In many secure programs, sensitive information is propagated in a localized context, but conditions exist that squash its propagation any further. We formulate the early termination check as a taint check on all live variables at the end of a loop body, i.e., if no live variable is tainted, then we can conclude that the program is secure without further loop unrolling. (This is under the standard assumption that inputs are tainted in the initial state. The early termination check can be suitably modified if tainted inputs are allowed to occur later.) Since our taint analysis is precise and pathsensitive, this approach can be beneficial in practice by unrolling the loops past the point where all taint has been squashed.
We have implemented these methods in the SeaHorn verification platform [18], which represents programs as CHC (Constrained Horn Clause) rules. Our prototype for taint analysis is flexible, with a fully symbolic encoding of the taint policy (i.e., rules for taint generation, propagation, and removal). It fully leverages SMTbased model checking techniques for precise taint analysis. Our prototypes allow rich security specifications in terms of annotations on low/highsecurity variables and locations in arrays, and predicates that allow information downgrading in specified contexts.
We present an experimental evaluation on benchmark examples. Our results clearly show the benefits of lazy selfcomposition vs. eager selfcomposition, where the former is much faster and allows verification to complete in larger examples. Our initial motivation in proposing the two verification methods was that we would find examples where one or the other method is better. We expect that easier proofs are likely to be found by the CEGARbased method, and easier bugs by the BMCbased method. As it turns out, most of our benchmark examples are easy to handle by both methods so far. We believe that our general approach of lazy selfcomposition would be beneficial in other verification methods, and both our methods show its effectiveness in practice.
To summarize, this paper makes the following contributions.

We present lazy selfcomposition, an approach to verifying secure information flow that reduces verification cost by exploiting the interplay between a pathsensitive symbolic taint analysis and safety checking on a selfcomposed program.

We present IfcCEGAR, a procedure for unbounded verification of secure information flow based on lazy selfcomposition using the CEGAR paradigm. IfcCEGAR starts with a taint analysis abstraction of information flow and iteratively refines this abstraction using selfcomposition. It is tailored toward proving that programs have secure information flow.

We present IfcBMC, a procedure for bounded verification of secure information flow. As the program is being unrolled, IfcBMC uses dynamic symbolic taint checks to determine which parts of the program need to be duplicated. This method is tailored toward bugfinding.

We develop prototype implementations of IfcCEGAR and IfcBMC and present an experimental evaluation of these methods on a set of benchmarks/microbenchmarks. Our results demonstrate that IfcCEGAR and IfcBMC easily outperform an eager selfcomposition that uses the same backend verification engines.
2 Motivating Example
Listing 1 shows a snippet from a function that performs multiword multiplication. The code snippet is instrumented to count the number of iterations of the inner loop that are executed in \(\mathtt {bigint\_shiftleft}\) and \(\mathtt {bigint\_add}\) (not shown for brevity). These iterations are counted in the variable \(\mathtt {steps}\). The security requirement is that \(\mathtt {steps}\) must not depend on the secret values in the array \(\mathtt {a}\); array \(\mathtt {b}\) is assumed to be public.
Static analyses, including those based on security types, will conclude that the variable \(\mathtt {steps}\) is “highsecurity.” This is because \(\mathtt {steps}\) is assigned in a conditional branch that depends on the highsecurity variable \(\mathtt {bi}\). However, this code is in fact safe because steps is incremented by the same value in both branches of the conditional statement.
Our lazy selfcomposition will handle this example by first using a symbolic taint analysis to conclude that the variable \(\mathtt {steps}\) is tainted. It will then selfcompose only those parts of the program related to computation of \(\mathtt {steps}\), and discover that it is set to identical values in both copies, thus proving the program is secure.
Now consider the case when the code in Listing 1 is used to multiply two “bigints” of differing widths, e.g., a 512b integer is multiplied with 2048b integer. If this occurs, the upper 1536 bits of \(\mathtt {a}\) will all be zeros, and \(\mathtt {bi}\) will not be a highsecurity variable for these iterations of the loop. Such a scenario can benefit from earlytermination in our BMCbased method: our analysis will determine that no tainted value flows to the low security variable \(\mathtt {steps}\) after iteration 512 and will immediately terminate the analysis.
3 Preliminaries
We consider First Order Logic modulo a theory \(\mathcal {T}\) and denote it by \(FOL(\mathcal {T})\). Given a program P, we define a safety verification problem w.r.t. P as a transition system \(M=\langle X, Init (X), Tr (X,X'), Bad (X)\rangle \) where X denotes a set of (uninterpreted) constants, representing program variables; \( Init , Tr \) and \( Bad \) are (quantifierfree) formulas in \(FOL(\mathcal {T})\) representing the initial states, transition relation and bad states, respectively. The states of a transition system correspond to structures over a signature \(\varSigma = \varSigma _{\mathcal {T}} \cup X\). We write \( Tr (X,X')\) to denote that \( Tr \) is defined over the signature \(\varSigma _{\mathcal {T}} \cup X \cup X'\), where X is used to represent the prestate of a transition, and \(X' = \{a' {\mid } a \in X\}\) is used to represent the poststate.
A safety verification problem is to decide whether a transition system M is SAFE or UNSAFE. We say that M is UNSAFE iff there exists a number N such that the following formula is satisfiable:
where \(X_i = \{a_i {\mid } a \in X\}\) is a copy of the program variables (uninterpreted constants) used to represent the state of the system after the execution of i steps.
When M is UNSAFE and \(s_N\in Bad \) is reachable, the path from \(s_0\in Init \) to \(s_N\) is called a counterexample (CEX).
A transition system M is SAFE iff the transition system has no counterexample, of any length. Equivalently, M is SAFE iff there exists a formula \( Inv \), called a safe inductive invariant, that satisfies: (i) \( Init (X) \rightarrow Inv(X)\), (ii) \( Inv (X) \wedge Tr (X,X') \rightarrow Inv (X')\), and (iii) \( Inv (X) \rightarrow \lnot Bad (X)\).
In SATbased model checking (e.g., based on IC3 [7] or interpolants [23, 34]), the verification procedure maintains an inductive trace of formulas \([F_0(X), \ldots , F_N(X)]\) that satisfy: (i) \( Init (X) \rightarrow F_0(X)\), (ii) \(F_i(X) \wedge Tr (X,X') \rightarrow F_{i+1}(X')\) for every \(0\le i < N\), and (iii) \(F_i(X) \rightarrow \lnot Bad (X)\) for every \(0\le i\le N\). A trace \([F_0, \ldots , F_N]\) is closed if \(\exists 1 \le i \le N \cdot F_i \Rightarrow \left( \bigvee _{j=0}^{i1}F_j\right) \). There is an obvious relationship between existence of closed traces and safety of a transition system: A transition system T is SAFE iff it admits a safe closed trace. Thus, safety verification is reduced to searching for a safe closed trace or finding a CEX.
4 Information Flow Analysis
Let P be a program over a set of program variables X. Recall that \( Init (X)\) is a formula describing the initial states and \( Tr (X,X')\) a transition relation. We assume a “stuttering” transition relation, namely, \( Tr \) is reflexive and therefore it can nondeterministically either move to the next state or stay in the same state. Let us assume that \(H \subset X\) is a set of highsecurity variables and \(L := X\backslash H\) is a set of lowsecurity variables.
For each \(x \in L\), let \(Obs_x(X)\) be a predicate over program variables X that determines when variable x is adversaryobservable. The precise definition of \(Obs_x(X)\) depends on the threat model being considered. A simple model would be that for each low variable \(x \in L\), \(Obs_x(X)\) holds only at program completion – this corresponds to a threat model where the adversary can run a program that operates on some confidential data and observe its public (lowsecurity) outputs after completion. A more sophisticated definition of \(Obs_x(X)\) could consider, for example, a concurrently executing adversary. Appropriate definitions of \(Obs_x(X)\) can also model declassification [29], by setting \(Obs_x(X)\) to be false in program states where the declassification of x is allowed.
The information flow problem checks whether there exists an execution of P such that the value of variables in H affects a variable in \(x \in L\) in some state where the predicate \(Obs_x(X)\) holds. Intuitively, information flow analysis checks if lowsecurity variables “leak” information about highsecurity variables.
We now describe our formulations of two standard techniques that have been used to perform information flow analysis. The first is based on taint analysis [30], but we use a symbolic (rather than a dynamic) analysis that tracks taint in a pathsensitive manner over the program. The second is based on selfcomposition [5], where two copies of the program are created and a safety property is checked over the composed program.
4.1 Symbolic Taint Analysis
When using taint analysis for checking information flow, we mark highsecurity variables with a “taint” and check if this taint can propagate to lowsecurity variables. The propagation of taint through program variables of P is determined by both assignments and the control structure of P. In order to perform precise taint analysis, we formulate it as a safety verification problem. For this purpose, for each program variable \(x\in X\), we introduce a new “taint” variable \(x_t\). Let \(X_t := \{x_tx\in X\}\) be the set of taint variables where \(x_t\in X_t\) is of sort Boolean. Let us define a transition system \(M_t := \langle Y, Init _t, Tr _t, Bad _t\rangle \) where \(Y := X\cup X_t\) and
Since taint analysis tracks information flow from highsecurity to lowsecurity variables, variables in \(H_t\) are initialized to \( true \) while variables in \(L_t\) are initialized to \( false \). W.l.o.g., let us denote the state update for a program variable \(x\in X\) as: \(x' = cond(X) \; ? \; \varphi _1(X)\; :\; \varphi _2(X)\). Let \(\varphi \) be a formula over \(\varSigma \). We capture the taint of \(\varphi \) by:
Thus, \(\hat{ Tr }(X_t,X_t')\) is defined as: \( \bigwedge \limits _{x_t\in X_t} x_t' = \varTheta (cond)\vee \left( cond\; ? \; \varTheta (\varphi _1) \; : \; \varTheta (\varphi _2) \right) \)
Intuitively, taint may propagate from \(x_1\) to \(x_2\) either when \(x_1\) is assigned an expression that involves \(x_2\) or when an assignment to \(x_1\) is controlled by \(x_2\). The bad states (\( Bad _t\)) are all states where a lowsecurity variable is tainted and observable.
4.2 Selfcomposition
When using selfcomposition, information flow is tracked over an execution of two copies of the program, P and \(P_d\). Let us denote \(X_d := \{x_dx\in X\}\) as the set of program variables of \(P_d\). Similarly, let \( Init _d(X_d)\) and \( Tr _d(X_d,X_d')\) denote the initial states and transition relation of \(P_d\). Note that \( Init _d\) and \( Tr _d\) are computed from \( Init \) and \( Tr \) by means of substitutions. Namely, substituting every occurrence of \(x\in X\) or \(x'\in X'\) with \(x_d\in X_d\) and \(x_d'\in X_d'\), respectively. Similarly to taint analysis, we formulate information flow over a selfcomposed program as a safety verification problem: \(M_d := \langle Z, Init _d, Tr _d, Bad _d\rangle \) where \(Z := X\cup X_d\) and
In order to track information flow, variables in \(L_d\) are initialized to be equal to their counterpart in L, while variables in \(H_d\) remain unconstrained. A leak is captured by the bad states (i.e. \( Bad _d\)). More precisely, there exists a leak iff there exists an execution of \(M_d\) that results in a state where \(Obs_x(X)\), \(Obs_x(X_d)\) hold and \(x \ne x_d\) for a lowsecurity variable \(x\in L\).
5 Lazy Selfcomposition for Information Flow Analysis
In this section, we introduce lazy selfcomposition for information flow analysis. It is based on an interplay between symbolic taint analysis on a single copy and safety verification on a selfcomposition, which were both described in the previous section.
Recall that taint analysis is imprecise for determining secure information flow in the sense that it may report spurious counterexamples, namely, spurious leaks. In contrast, selfcomposition is precise, but less efficient. The fact that self composition requires a duplication of the program often hinders its performance. The main motivation for lazy selfcomposition is to target both efficiency and precision.
Intuitively, the model for symbolic taint analysis \(M_t\) can be viewed as an abstraction of the selfcomposed model \(M_d\), where the Boolean variables in \(M_t\) are predicates tracking the states where \(x\ne x_d\) for some \(x\in X\). This intuition is captured by the following statement: \(M_t\) overapproximates \(M_d\).
Corollary 1
If there exists a path in \(M_d\) from \( Init _d\) to \( Bad _d\) then there exists a path in \(M_t\) from \( Init _t\) to \( Bad _t\).
Corollary 2
If there exists no path in \(M_t\) from \( Init _t\) to \( Bad _t\) then there exists no path in \(M_d\) from \( Init _d\) to \( Bad _d\).
This abstractionbased view relating symbolic taint analysis and selfcomposition can be exploited in different verification methods for checking secure information flow. In this paper, we focus on two – a CEGARbased method (IfcCEGAR) and a BMCbased method (IfcBMC). These methods using lazy selfcomposition are now described in detail.
5.1 IFCCEGAR
We make use of the fact that \(M_t\) can be viewed as an abstraction w.r.t. to \(M_d\), and propose an abstractionrefinement paradigm for secure information flow analysis. In this setting, \(M_t\) is used to find a possible counterexample, i.e., a path that leaks information. Then, \(M_d\) is used to check if this counterexample is spurious or real. In case the counterexample is found to be spurious, IfcCEGAR uses the proof that shows why the counterexample is not possible in \(M_d\) to refine \(M_t\).
A sketch of IfcCEGAR appears in Algorithm 1. Recall that we assume that solving a safety verification problem is done by maintaining an inductive trace. We denote the traces for \(M_t\) and \(M_d\) by \(\varvec{G}=[G_0,\ldots ,G_k]\) and \(\varvec{H}=[H_0,\ldots ,H_k]\), respectively. IfcCEGAR starts by initializing \(M_t\), \(M_d\) and their respective traces \(\varvec{G}\) and \(\varvec{H}\) (lines 1–4). The main loop of IfcCEGAR (lines 5–18) starts by looking for a counterexample over \(M_t\) (line 6). In case no counterexample is found, IfcCEGAR declares there are no leaks and returns SAFE.
If a counterexample \(\pi \) is found in \(M_t\), IfcCEGAR first updates the trace of \(M_d\), i.e. \(\varvec{H}\), by rewriting \(\varvec{G}\) (line 10). In order to check if \(\pi \) is spurious, IfcCEGAR creates a new safety verification problem \(M_c\), a version of \(M_d\) constrained by \(\pi \) (line 11) and solves it (line 12). If \(M_c\) has a counterexample, IfcCEGAR returns UNSAFE. Otherwise, \(\varvec{G}\) is updated by \(\varvec{H}\) (line 16) and \(M_t\) is refined such that \(\pi \) is ruled out (line 17).
The above gives a highlevel overview of how IfcCEGAR operates. We now go into more detail. More specifically, we describe the functions \(\texttt {ReWrite}\), \(\texttt {Constraint}\) and \(\texttt {Refine}\). We note that these functions can be designed and implemented in several different ways. In what follows we describe some possible choices.
ProofBased Abstraction. Let us assume that when solving \(M_t\) a counterexample \(\pi \) of length k is found and an inductive trace \(\varvec{G}\) is computed. Following a proofbased abstraction approach, \(\texttt {Constraint}()\) uses the length of \(\pi \) to bound the length of possible executions in \(M_d\) by k. Intuitively, this is similar to bounding the length of the computed inductive trace over \(M_d\).
In case \(M_c\) has a counterexample, a real leak (of length k) is found. Otherwise, since \(M_c\) considers all possible executions of \(M_d\) of length k, IfcCEGAR deduces that there are no counterexamples of length k. In particular, the counterexample \(\pi \) is ruled out. IfcCEGAR therefore uses this fact to refine \(M_t\) and \(\varvec{G}\).
Inductive Trace Rewriting. Consider the set of program variables X, taint variables \(X_t\), and self compositions variables \(X_d\). As noted above, \(M_t\) overapproximates \(M_d\). Intuitively, it may mark a variable x as tainted when x does not leak information. Equivalently, if a variable x is found to be untainted in \(M_t\) then it is known to also not leak information in \(M_d\). More formally, the following relation holds: \(\lnot x_t\rightarrow (x = x_d)\).
This gives us a procedure for rewriting a trace over \(M_t\) to a trace over \(M_d\). Let \(\varvec{G}=[G_0,\ldots ,G_k]\) be an inductive trace over \(M_t\). Considering the definition of \(M_t\), \(\varvec{G}\) can be decomposed and rewritten as: \(G_i(Y) := \bar{G}_i(X)\wedge \bar{G}^t_i(X_t)\wedge \psi (X,X_t) \). Namely, \(\bar{G}_i(X)\) and \(\bar{G}^t_i(X_t)\) are subformulas of \(G_i\) over only X and \(X_t\) variables, respectively, and \(\psi (X,X_t) \) is the part connecting X and \(X_t\).
Since \(\varvec{G}\) is an inductive trace \(G_i(Y)\wedge Tr _t(Y,Y')\rightarrow G_{i+1}(Y')\) holds. Following the definition of \( Tr _t\) and the above decomposition of \(G_i\), the following holds:
Let \(\varvec{H}= [H_0,\ldots ,H_k]\) be a trace w.r.t. \(M_d\). We define the update of \(\varvec{H}\) by \(\varvec{G}\) as the trace \(\varvec{H}^* = [H^*_0,\ldots ,H^*_k]\), which is defined as follows:
Intuitively, if a variable \(x\in X\) is known to be untainted in \(M_t\), using Corollary 2 we conclude that \(x = x_d\) in \(M_d\).
A similar update can be defined when updating a trace \(\varvec{G}\) w.r.t. \(M_t\) by a trace \(\varvec{H}\) w.r.t. \(M_d\). In this case, we use the following relation: \(\lnot (x = x_d)\rightarrow x_t\). Let \(\varvec{H}=[H_0(Z),\ldots ,H_k(Z)]\) be the inductive trace w.r.t. \(M_d\). \(\varvec{H}\) can be decomposed and written as \(H_i(Z) := \bar{H}_i(X)\wedge \bar{H}_i^d(X_d)\wedge \phi (X,X_d)\).
Due to the definition of \(M_d\) and an inductive trace, the following holds:
We can therefore update a trace \(\varvec{G}= [G_0,\ldots ,G_k]\) w.r.t. \(M_t\) by defining the trace \(\varvec{G}^*=[G^*_0,\ldots ,G^*_k]\), where:
Updating \(\varvec{G}\) by \(\varvec{H}\), and viceversa, as described above is based on the fact that \(M_t\) overapproximates \(M_d\) w.r.t. tainted variables (namely, Corollaries 1 and 2). It is therefore important to note that \(\varvec{G}^*\) in particular, does not “gain” more precision due to this process.
Lemma 1
Let \(\varvec{G}\) be an inductive trace w.r.t. \(M_t\) and \(\varvec{H}\) an inductive trace w.r.t. \(M_d\). Then, the updated \(\varvec{H}^*\) and \(\varvec{G}^*\) are inductive traces w.r.t. \(M_d\) and \(M_t\), respectively.
Refinement. Recall that in the current scenario, a counterexample was found in \(M_t\), and was shown to be spurious in \(M_d\). This fact can be used to refine both \(M_t\) and \(\varvec{G}\).
As a first step, we observe that if \(x = x_d\) in \(M_d\), then \(\lnot x_t\) should hold in \(M_t\). However, since \(M_t\) is an overapproximation it may allow x to be tainted, namely, allow \(x_t\) to be evaluated to \( true \).
In order to refine \(M_t\) and \(\varvec{G}\), we define a strengthening procedure for \(\varvec{G}\), which resembles the updating procedure that appears in the previous section. Let \(\varvec{H}= [H_0,\ldots ,H_k]\) be a trace w.r.t. \(M_d\) and \(\varvec{G}= [G_0,\ldots , G_k]\) be a trace w.r.t. \(M_t\), then the strengthening of \(\varvec{G}\) is denoted as \(\varvec{G}^r = [G^r_0,\ldots ,G^r_k]\) such that:
The above gives us a procedure for strengthening \(\varvec{G}\) by using \(\varvec{H}\). Note that since \(M_t\) is an overapproximation of \(M_d\), it may allow a variable \(x\in X\) to be tainted, while in \(M_d\) (and therefore in \(\varvec{H}\)), \(x = x_d\). As a result, after strengthening \(\varvec{G}^r\) is not necessarily an inductive trace w.r.t. \(M_t\), namely, \(G^{r}_i\wedge Tr _t\rightarrow G^{r}_{i+1}{'}\) does not necessarily hold. In order to make \(\varvec{G}^r\) an inductive trace, \(M_t\) must be refined.
Let us assume that \(G^{r}_i\wedge Tr _t\rightarrow G^{r}_{i+1}{'}\) does not hold. By that, \(G^r_i\wedge Tr _t\wedge \lnot G^r_{i+1}{'}\) is satisfiable. Considering the way \(\varvec{G}^r\) is strengthened, three exists \(x\in X\) such that \(G^r_i\wedge Tr _t\wedge x_t'\) is satisfiable and \(G^r_{i+1}\Rightarrow \lnot x_t\). The refinement step is defined by:
This refinement step changes the next state function of \(x_t\) such that whenever \(G_i\) holds, \(x_t\) is forced to be \( false \) at the next time frame.
Lemma 2
Let \(\varvec{G}^r\) be a strengthened trace, and let \(M^r_t\) be the result of refinement as defined above. Then, \(\varvec{G}^r\) is an inductive trace w.r.t \(M^r_t\).
Theorem 1
Let \(\mathfrak {A}\) be a sound and complete model checking algorithm w.r.t. \(FOL(\mathcal {T})\) for some \(\mathcal {T}\), such that \(\mathfrak {A}\) maintains an inductive trace. Assuming IfcCEGAR uses \(\mathfrak {A}\), then IfcCEGAR is both sound and complete.
Proof
(Sketch). Soundness follows directly from the soundness of taint analysis. For completeness, assume \(M_d\) is SAFE. Due to our assumption that \(\mathfrak {A}\) is sound and complete, \(\mathfrak {A}\) emits a closed inductive trace \(\varvec{H}\). Intuitively, assuming \(\varvec{H}\) is of size k, then the next state function of every taint variable in \(M_t\) can be refined to be a constant false after a specific number of steps. Then, \(\varvec{H}\) can be translated to a closed inductive trace \(\varvec{G}\) over \(M_t\) by following the above presented formalism. Using Lemma 2 we can show that a closed inductive trace exists for the refined taint model.
5.2 IFCBMC
In this section we introduce a different method based on Bounded Model Checking (BMC) [6] that uses lazy selfcomposition for solving the information flow security problem. This approach is described in Algorithm 2. In addition to the program P, and the specification of highsecurity variables H, it uses an extra parameter BND that limits the maximum number of loop unrolls performed on the program P. (Alternatively, one can fall back to an unbounded verification method after BND is reached in BMC).
In each iteration of the algorithm (line 2), loops in the program P are unrolled (line 3) to produce a loopfree program, encoded as a transition system M(i). A new transition system \(M_t(i)\) is created (line 4) following the method described in Sect. 4.1, to capture precise taint propagation in the unrolled program M(i). Then lazy selfcomposition is applied (line 5), as shown in detail in Algorithm 3, based on the interplay between the taint model \(M_t(i)\) and the transition system M(i). In detail, for each variable x updated in M(i), where the state update is denoted \(x := \varphi \), we use \(x_t\) in \(M_t(i)\) to encode whether x is possibly tainted. We generate an SMT query to determine if \(x_t\) is satisfiable. If it is unsatisfiable, i.e., \(x_t\) evaluates to False, we can conclude that high security variables cannot affect the value of x. In this case, its duplicate variable \(x'\) in the selfcomposed program \(M_s(i)\) is set equal to x, eliminating the need to duplicate the computation that will produce \(x'\). Otherwise if \(x_t\) is satisfiable (or unknown), we duplicate \(\varphi \) and update \(x'\) accordingly.
The selfcomposed program \(M_s(i)\) created by LazySC (Algorithm 3) is then checked by a bounded model checker, where a bad state is a state where any lowsecurity output y (\(y \in L\), where L denotes the set of lowsecurity variables) has a different value than its duplicate variable \(y'\) (line 6). (For ease of exposition, a simple definition of bad states is shown here. This can be suitably modified to account for \(Obs_x(X)\) predicates described in Sect. 4.) A counterexample produced by the solver indicates a leak in the original program P. We also use an early termination check for BMC encoded as an SMTbased query CheckLiveTaint, which essentially checks whether any live variable is tainted (line 10). If none of the live variables is tainted, i.e., any initial taint from highsecurity inputs has been squashed, then IfcBMC can stop unrolling the program any further. If no conclusive result is obtained, IfcBMC will return \( UNKNOWN \).
6 Implementation and Experiments
We have implemented prototypes of IfcCEGAR and IfcBMC for information flow checking. Both are implemented on top of SeaHorn [18], a software verification platform that encodes programs as CHC (Constrained Horn Clause) rules. It has a frontend based on LLVM [22] and backends to Z3 [15] and other solvers. Our prototype has a few limitations. First, it does not support bitprecise reasoning and does not support complex data structures such as lists. Our implementation of symbolic taint analysis is flexible in supporting any given taint policy (i.e., rules for taint generation, propagation, and removal). It uses an encoding that fully leverages SMTbased model checking techniques for precise taint analysis. We believe this module can be independently used in other applications for security verification.
6.1 Implementation Details
IfcCEGAR Implementation. As discussed in Sect. 5.1, the IfcCEGAR implementation uses taint analysis and selfcomposition synergistically and is tailored toward proving that programs are secure. Both taint analysis and selfcomposition are implemented as LLVMpasses that instrument the program. Our prototype implementation executes these two passes interchangeably as the problem is being solved. The IfcCEGAR implementation uses Z3’s CHC solver engine called Spacer. Spacer, and therefore our IfcCEGAR implementation, does not handle the bitvector theory, limiting the set of programs that can be verified using this prototype. Extending the prototype to support this theory will be the subject of future work.
IfcBMC Implementation. In the IfcBMC implementation, the loop unroller, taint analysis, and lazy selfcomposition are implemented as passes that work on CHC, to generate SMT queries that are passed to the backend Z3 solver. Since the IfcBMC implementation uses Z3, and not Spacer, it can handle all the programs in our evaluation, unlike the IfcCEGAR implementation.
Input Format. The input to our tools is a Cprogram with annotations indicating which variables are secret and the locations at which leaks should be checked. In addition, variables can be marked as untainted at specific locations.
6.2 Evaluation Benchmarks
For experiments we used a machine running Intel Core i74578U with 8GB of RAM. We tested our prototypes on several microbenchmarks^{Footnote 2} in addition to benchmarks inspired by realworld programs. For comparison against eager selfcomposition, we used the SeaHorn backend solvers on a 2copy version of the benchmark. fibonacci is a microbenchmark that computes the Nth Fibonacci number. There are no secrets in the microbenchmark, and this is a sanity check taken from [33]. list_4/8/16 are programs working with linked lists, the trailing number indicates the maximum number of nodes being used. Some linked list nodes contain secrets while others have public data, and the verification problem is to ensure that a particular function that operates on the linked list does not leak the secret data. modadd_safe is program that performs multiword addition; modexp_safe/unsafe are variants of a program performing modular exponentiation; and pwdcheck_safe/unsafe are variants of program that compares an input string with a secret password. The verification problem in these examples is to ensure that an iterator in a loop does not leak secret information, which could allow a timing attack. Among these benchmarks, the list_4/8/16 use structs while modexp_safe/unsafe involve bitvector operations, both of which are not supported by Spacer, and thus not by our IfcCEGAR prototype.
6.3 IFCCEGAR Results
Table 1 shows the IfcCEGAR results on benchmark examples with varying parameter values. The columns show the time taken by eager selfcomposition (Eager SC) and IfcCEGAR, and the number of refinements in IfcCEGAR. “TO” denotes a timeout of 300 s.
We note that all examples are secure and do not leak information. Since our pathsensitive symbolic taint analysis is more precise than a typebased taint analysis, there are few counterexamples and refinements. In particular, for our first example pwdcheck_safe, selfcomposition is not required as our pathsensitive taint analysis is able to prove that no taint propagates to the variables of interest. It is important to note that typebased taint analysis cannot prove that this example is secure. For our second example, pwdcheck2_safe, our pathsensitive taint analysis is not enough. Namely, it finds a counterexample, due to an implicit flow where a forloop is conditioned on a tainted value, but there is no real leak because the loop executes a constant number of times. Our refinementbased approach can easily handle this case, where IfcCEGAR uses selfcomposition to find that the counterexample is spurious. It then refines the taint analysis model, and after one refinement step, it is able to prove that pwdcheck2_safe is secure. While these examples are fairly small, they clearly show that IfcCEGAR is superior to eager selfcomposition.
6.4 IFCBMC Results
The experimental results for IfcBMC are shown in Table 2, where we use some unsafe versions of benchmark examples as well. Results are shown for total time taken by eager selfcomposition (Eager SC) and the IfcBMC algorithm. (As before, “TO” denotes a timeout of 300 s.) IfcBMC is able to produce an answer significantly faster than eager selfcomposition for all examples. The last two columns show the time spent in taint checks in IfcBMC, and the number of taint checks performed.
To study the scalability of our prototype, we tested IfcBMC on the modular exponentiation program with different values for the maximum size of the integer array in the program. These results are shown in Table 3. Although the IfcBMC runtime grows exponentially, it is reasonably fast – less than 2 min for an array of size 64.
7 Related Work
A rich body of literature has studied the verification of secure information flow in programs. Initial work dates back to Denning and Denning [16], who introduced a program analysis to ensure that confidential data does not flow to nonconfidential outputs. This notion of confidentiality relates closely to: (i) noninterference introduced by Goguen and Meseguer [17], and (ii) separability introduced by Rushby [27]. Each of these study a notion of secure information flow where confidential data is strictly not allowed to flow to any nonconfidential output. These definitions are often too restrictive for practical programs, where secret data might sometimes be allowed to flow to some nonsecret output (e.g., if the data is encrypted before output), i.e. they require declassification [29]. Our approach allows easy and finegrained declassification.
A large body of work has also studied the use of type systems that ensure secure information flow. Due to a lack of space, we review a few exemplars and refer the reader to Sabelfeld and Myers [28] for a detailed survey. Early work in this area dates back to Volpano et al. [35] who introduced a type system that maintains secure information based on the work of Denning and Denning [16]. Myers introduced the JFlow programming language (later known as Jif: Java information flow) [25] which extended Java with security types. Jif has been used to build clean slate, secure implementations of complex endtoend systems, e.g. the Civitas [10] electronic voting system. More recently, Patrigiani et al. [26] introduced the Java Jr. language which extends Java with a security type system, automatically partitions the program into secure and nonsecure parts and executes the secure parts inside socalled protected module architectures. In contrast to these approaches, our work can be applied to existing securitycritical code in languages like C with the addition of only a few annotations.
A different approach to verifying secure information flow is the use of dynamic taint analysis (DTA) [3, 12, 13, 21, 30, 31] which instruments a program with taint variables and taint tracking code. Advantages of DTA are that it is scalable to very large applications [21], can be accelerated using hardware support [13], and tracks information flow across processes, applications and even over the network [12]. However, taint analysis necessarily involves imprecision and in practice leads to both false positives and false negatives. False positives arise because taint analysis is an overapproximation. Somewhat surprisingly, false negatives are also introduced because tracking implicit flows using taint analysis leads to a deluge of falsepositives [30], thus causing practical taint tracking systems to ignore implicit flows. Our approach does not have this imprecision.
Our formulation of secure information flow is based on the selfcomposition construction proposed by Barthe et al. [5]. A specific type of selfcomposition called product programs was considered by Barthe et al. [4], which does not allow control flow divergence between the two programs. In general this might miss certain bugs as it ignores implicit flows. However, it is useful in verifying cryptographic code which typically has very structured control flow. Almeida et al. [1] used the product construction to verify that certain functions in cryptographic libraries execute in constanttime.
Terauchi and Aiken [33] generalized selfcomposition to consider ksafety, which uses \(k\,\,1\) compositions of a program with itself. Note that selfcomposition is a 2safety property. An automated verifier for ksafety properties of Java programs based on Cartesian Hoare Logic was proposed by Sousa and Dillig [32]. A generalization of Cartesian Hoare Logic, called Quantitative Cartesian Hoare Logic was introduced by Chen et al. [8]; the latter can also be used to reason about the execution time of cryptographic implementations. Among these efforts, our work is mostly closely related to that of Terauchi and Aiken [33], who used a typebased analysis as a preprocessing step to selfcomposition. We use a similar idea, but our taint analysis is more precise due to being pathsensitive, and it is used within an iterative CEGAR loop. Our pathsensitive taint analysis leads to fewer counterexamples and thereby cheaper selfcomposition, and our refinement approach can easily handle examples with benign branches. In contrast to the other efforts, our work uses lazy instead of eager selfcomposition, and is thus more scalable, as demonstrated in our evaluation. A recent work [2] also employs tracebased refinement in security verification, but it does not use selfcomposition.
Our approach has some similarities to other problems related to tainting [19]. In particular, ChangeImpact Analysis is the problem of determining what parts of a program are affected due to a change. Intuitively, it can be seen as a form of taint analysis, where the change is treated as taint. To solve this, Gyori et al. [19] propose a combination of an imprecise typebased approach with a precise semanticspreserving approach. The latter considers the program before and after the change and finds relational equivalences between the two versions. These are then used to strengthen the typebased approach. While our work has some similarities, there are crucial differences as well. First, our taint analysis is not typebased, but is pathsensitive and preserves the correctness of the defined abstraction. Second, our lazy selfcomposition is a form of an abstractionrefinement framework, and allows a tighter integration between the imprecise (taint) and precise (selfcomposition) models.
8 Conclusions and Future Work
A wellknown approach for verifying secure information flow is based on the notion of selfcomposition. In this paper, we have introduced a new approach for this verification problem based on lazy selfcomposition. Instead of eagerly duplicating the program, lazy selfcomposition uses a synergistic combination of symbolic taint analysis (on a single copy program) and selfcomposition by duplicating relevant parts of the program, depending on the result of the taint analysis. We presented two instances of lazy selfcomposition: the first uses taint analysis and selfcomposition in a CEGAR loop; the second uses bounded model checking to dynamically query taint checks and selfcomposition based on the results of these dynamic checks. Our algorithms have been implemented in the SeaHorn verification platform and results show that lazy selfcomposition is able to verify many instances not verified by eager selfcomposition.
In future work, we are interested in extending lazy selfcomposition to support learning of quantified relational invariants. These invariants are often required when reasoning about information flow in shared data structures of unbounded size (e.g., unbounded arrays, linked lists) that contain both high and lowsecurity data. We are also interested in generalizing lazy selfcomposition beyond informationflow to handle other ksafety properties like injectivity, associativity and monotonicity.
Notes
 1.
This name is inspired by the lazy abstraction approach [20] for software model checking.
 2.
References
Almeida, J.B., Barbosa, M., Barthe, G., Dupressoir, F., Emmi, M.: Verifying constanttime implementations. In: 25th USENIX Security Symposium, USENIX Security, pp. 53–70 (2016)
Antonopoulos, T., Gazzillo, P., Hicks, M., Koskinen, E., Terauchi, T., Wei, S.: Decomposition instead of selfcomposition for proving the absence of timing channels. In: PLDI, pp. 362–375 (2017)
Babil, G.S., Mehani, O., Boreli, R., Kaafar, M.: On the effectiveness of dynamic taint analysis for protecting against private information leaks on Androidbased devices. In: Proceedings of Security and Cryptography (2013)
Barthe, G., Crespo, J.M., Kunz, C.: Relational verification using product programs. In: Butler, M., Schulte, W. (eds.) FM 2011. LNCS, vol. 6664, pp. 200–214. Springer, Heidelberg (2011). https://doi.org/10.1007/9783642214370_17
Barthe, G., D’Argenio, P.R., Rezk,T.: Secure information flow by selfcomposition. In: 17th IEEE Computer Security Foundations Workshop, CSFW17, pp. 100–114 (2004)
Biere, A., Cimatti, A., Clarke, E., Zhu, Y.: Symbolic model checking without BDDs. In: Cleaveland, W.R. (ed.) TACAS 1999. LNCS, vol. 1579, pp. 193–207. Springer, Heidelberg (1999). https://doi.org/10.1007/3540490590_14
Bradley, A.R.: SATbased model checking without unrolling. In: Jhala, R., Schmidt, D. (eds.) VMCAI 2011. LNCS, vol. 6538, pp. 70–87. Springer, Heidelberg (2011). https://doi.org/10.1007/9783642182754_7
Chen, J., Feng, Y., Dillig, I.: Precise detection of sidechannel vulnerabilities using quantitative Cartesian Hoare logic. In: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, CCS 2017, pp. 875–890. ACM, New York (2017)
Clarke, E., Grumberg, O., Jha, S., Lu, Y., Veith, H.: Counterexampleguided abstraction refinement. In: Emerson, E.A., Sistla, A.P. (eds.) CAV 2000. LNCS, vol. 1855, pp. 154–169. Springer, Heidelberg (2000). https://doi.org/10.1007/10722167_15
Clarkson, M.R., Chong, S., Myers, A.C.: Civitas: toward a secure voting system. In: Proceedings of the 2008 IEEE Symposium on Security and Privacy, SP 2008, pp. 354–368. IEEE Computer Society, Washington, DC (2008)
Clarkson, M.R., Schneider, F.B.: Hyperproperties. J. Comput. Secur. 18(6), 1157–1210 (2010)
Costa, M., Crowcroft, J., Castro, M., Rowstron, A., Zhou, L., Zhang, L., Barham, P.: Vigilante: endtoend containment of Internet worms. In: Proceedings of the Symposium on Operating Systems Principles (2005)
Crandall, J.R., Chong, F.T.: Minos: control data attack prevention orthogonal to memory model. In: Proceedings of the 37th IEEE/ACM International Symposium on Microarchitecture (2004)
De Angelis, E., Fioravanti, F., Pettorossi, A., Proietti, M.: Relational verification through horn clause transformation. In: Rival, X. (ed.) SAS 2016. LNCS, vol. 9837, pp. 147–169. Springer, Heidelberg (2016). https://doi.org/10.1007/9783662534137_8
de Moura, L., Bjørner, N.: Z3: an efficient SMT solver. In: Ramakrishnan, C.R., Rehof, J. (eds.) TACAS 2008. LNCS, vol. 4963, pp. 337–340. Springer, Heidelberg (2008). https://doi.org/10.1007/9783540788003_24
Denning, D.E., Denning, P.J.: Certification of programs for secure information flow. Commun. ACM 20(7), 504–513 (1977)
Goguen, J.A., Meseguer, J.: Security policies and security models. In: 1982 IEEE Symposium on Security and Privacy, Oakland, CA, USA, 26–28 April 1982, pp. 11–20 (1982)
Gurfinkel, A., Kahsai, T., Komuravelli, A., Navas, J.A.: The SeaHorn verification framework. In: Kroening, D., Păsăreanu, C.S. (eds.) CAV 2015. LNCS, vol. 9206, pp. 343–361. Springer, Cham (2015). https://doi.org/10.1007/9783319216904_20
Gyori, A., Lahiri, S.K., Partush, N.: Refining interprocedural changeimpact analysis using equivalence relations. In: Proceedings of the 26th ACM SIGSOFT International Symposium on Software Testing and Analysis, Santa Barbara, CA, USA, 10–14 July 2017, pp. 318–328 (2017)
Henzinger, T.A., Jhala, R., Majumdar, R., Sutre, G.: Lazy abstraction. In: The SIGPLANSIGACT Symposium on Principles of Programming Languages, pp. 58–70 (2002)
Kang, M.G., McCamant, S., Poosankam, P., Song, D.: DTA++: dynamic taint analysis with targeted controlflow propagation. In: Proceedings of the Network and Distributed System Security Symposium (2011)
Lattner, C., Adve, V.S.: LLVM: a compilation framework for lifelong program analysis & transformation. In: 2nd IEEE/ACM International Symposium on Code Generation and Optimization, CGO, pp. 75–88 (2004)
McMillan, K.L.: Interpolation and SATbased model checking. In: Hunt, W.A., Somenzi, F. (eds.) CAV 2003. LNCS, vol. 2725, pp. 1–13. Springer, Heidelberg (2003). https://doi.org/10.1007/9783540450696_1
Mordvinov, D., Fedyukovich, G.: Synchronizing constrained horn clauses. In: EPiC Series in Computing, LPAR, vol. 46, pp. 338–355. EasyChair (2017)
Myers, A.C.: JFlow: practical mostlystatic information flow control. In: Proceedings of the 26th Annual ACM SIGPLANSIGACT Symposium on Principles of Programming Languages. ACM (1999)
Patrignani, M., Agten, P., Strackx, R., Jacobs, B., Clarke, D., Piessens, F.: Secure compilation to protected module architectures. ACM Trans. Program. Lang. Syst. 37(2), 6:1–6:50 (2015)
Rushby, J.M.: Proof of separability a verification technique for a class of security kernels. In: DezaniCiancaglini, M., Montanari, U. (eds.) Programming 1982. LNCS, vol. 137, pp. 352–367. Springer, Heidelberg (1982). https://doi.org/10.1007/3540114947_23
Sabelfeld, A., Myers, A.C.: Languagebased informationflow security. IEEE J. Sel. Areas Commun. 21(1), 5–19 (2006)
Sabelfeld, A., Sands, D.: Declassification: dimensions and principles. J. Comput. Secur. 17(5), 517–548 (2009)
Schwartz, E., Avgerinos, T., Brumley, D.: All you ever wanted to know about dynamic taint analysis and forward symbolic execution (but might have been afraid to ask). In: Proceedings of the 2010 IEEE Symposium on Security and Privacy (2010)
Song, D., et al.: BitBlaze: a new approach to computer security via binary analysis. In: Sekar, R., Pujari, A.K. (eds.) ICISS 2008. LNCS, vol. 5352, pp. 1–25. Springer, Heidelberg (2008). https://doi.org/10.1007/9783540898627_1
Sousa, M., Dillig, I.: Cartesian hoare logic for verifying ksafety properties. In: Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2016, pp. 57–69. ACM, New York (2016)
Terauchi, T., Aiken, A.: Secure information flow as a safety problem. In: Hankin, C., Siveroni, I. (eds.) SAS 2005. LNCS, vol. 3672, pp. 352–367. Springer, Heidelberg (2005). https://doi.org/10.1007/11547662_24
Vizel, Y., Gurfinkel, A.: Interpolating property directed reachability. In: Biere, A., Bloem, R. (eds.) CAV 2014. LNCS, vol. 8559, pp. 260–276. Springer, Cham (2014). https://doi.org/10.1007/9783319088679_17
Volpano, D., Irvine, C., Smith, G.: A sound type system for secure flow analysis. J. Comput. Secur. 4(2–3), 167–187 (1996)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
<SimplePara><Emphasis Type="Bold">Open Access</Emphasis>This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.</SimplePara><SimplePara>The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.</SimplePara>
Copyright information
© 2018 The Author(s)
About this paper
Cite this paper
Yang, W., Vizel, Y., Subramanyan, P., Gupta, A., Malik, S. (2018). Lazy Selfcomposition for Security Verification. In: Chockler, H., Weissenbacher, G. (eds) Computer Aided Verification. CAV 2018. Lecture Notes in Computer Science(), vol 10982. Springer, Cham. https://doi.org/10.1007/9783319961422_11
Download citation
DOI: https://doi.org/10.1007/9783319961422_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 9783319961415
Online ISBN: 9783319961422
eBook Packages: Computer ScienceComputer Science (R0)