Constraint-Based Monitoring of Hyperproperties

Verifying hyperproperties at runtime is a challenging problem as hyperproperties, such as non-interference and observational determinism, relate multiple computation traces with each other. It is necessary to store previously seen traces, because every new incoming trace needs to be compatible with every run of the system observed so far. Furthermore, the new incoming trace poses requirements on future traces. In our monitoring approach, we focus on those requirements by rewriting a hyperproperty in the temporal logic HyperLTL to a Boolean constraint system. A hyperproperty is then violated by multiple runs of the system if the constraint system becomes unsatisfiable. We compare our implementation, which utilizes either BDDs or a SAT solver to store and evaluate constraints, to the automata-based monitoring tool RVHyper.


Introduction
As today's complex and large-scale systems are usually far beyond the scope of classic verification techniques like model checking or theorem proving, we are in the need of light-weight monitors for controlling the flow of information. By instrumenting efficient monitoring techniques in such systems that operate in an unpredictable privacy-critical environment, countermeasures will be enacted before irreparable information leaks happen. Information-flow policies, however, cannot be monitored with standard runtime verification techniques as they relate multiple runs of a system. For example, observational determinism [19,21,24] is a policy stating that altering non-observable input has no impact on the observable behavior. Hyperproperties [7] are a generalization of trace properties and are thus capable of expressing information-flow policies. HyperLTL [6] is a recently introduced temporal logic for hyperproperties, which extends Linear-time Temporal Logic (LTL) [20] with trace variables and explicit trace quantification. Observational determinism is expressed as the formula ∀π, π . (out π ↔ out π ) W(in π in π ), stating that all traces π, π should agree on the output as long as they agree on the inputs.
In contrast to classic trace property monitoring, where a single run suffices to determine a violation, in runtime verification of HyperLTL formulas, we are concerned whether a set of runs through a system violates a given specification. In the common setting, those runs are given sequentially to the runtime monitor [1,2,12,13], which determines if the given set of runs violates the specification. An alternative view on HyperLTL monitoring is that every new incoming trace poses requirements on future traces. For example, the event {in, out} in the observational determinism example above asserts that for every other trace, the output out has to be enabled if in is enabled. Approaches based on static automata constructions [1,12,13] perform very well on this type of specifications, although their scalability is intrinsically limited by certain parameters: The automaton construction becomes a bottleneck for more complex specifications, especially with respect to the number of atomic propositions. Furthermore, the computational workload grows steadily with the number of incoming traces, as every trace seen so far has to be checked against every new trace. Even optimizations [12], which minimize the amount of traces that must be stored, turn out to be too coarse grained as the following example shows. Consider the monitoring of the HyperLTL formula ∀π, π . (a π → ¬b π ), which states that globally if a occurs on any trace π, then b is not allowed to hold on any trace π , on the following incoming traces: In prior work [12], we observed that traces, which pose less requirements on future traces, can safely be discarded from the monitoring process. In the example above, the requirements of trace 1 are dominated by the requirements of trace 2, namely that b is not allowed to hold on the first and second position of new incoming traces. Hence, trace 1 must not longer be stored in order to detect a violation. But with the proposed language inclusion check in [12], neither trace 2 nor trace 3 can be discarded, as they pose incomparable requirements. They have, however, overlapping constraints, that is, they both enforce ¬b in the first step.
To further improve the conciseness of the stored traces information, we use rewriting, which is a more fine-grained monitoring approach. The basic idea is to track the requirements that future traces have to fulfill, instead of storing a set of traces. In the example above, we would track the requirement that b is not allowed to hold on the first three positions of every freshly incoming trace. Rewriting has been applied successfully to trace properties, namely LTL formulas [17]. The idea is to partially evaluate a given LTL specification ϕ on an incoming event by unrolling ϕ according to the expansion laws of the temporal operators. The result of a single rewrite is again an LTL formula representing the updated specification, which the continuing execution has to satisfy. We use rewriting techniques to reduce ∀ 2 HyperLTL formulas to LTL constraints and check those constraints for inconsistencies corresponding to violations.
In this paper, we introduce a complete and provably correct rewritingbased monitoring approach for ∀ 2 HyperLTL formulas. Our algorithm rewrites a HyperLTL formula and a single event into a constraint composed of plain LTL and HyperLTL. For example, assume the event {in, out} while monitoring observational determinism formalized above. The first step of the rewriting applies the expansion laws for the temporal operators, which results in (in π in π ) ∨ (out π ↔ out π ) ∧ ((out π ↔ out π ) W(in π in π )). The event {in, out} is rewritten for atomic propositions indexed by the trace variable π. This means replacing each occurrence of in or out in the current expansion step, i.e., before the operator, with . Additionally, we strip the π trace quantifier in the current expansion step from all other atomic propositions. This leaves us with ( in) ∨ ( ↔ out) ∧ ((out π ↔ out π ) W(in π in π )). After simplification we have ¬in ∨ out ∧ ((out π ↔ out π ) W(in π in π )) as the new specification, which consists of a plain LTL part and a HyperLTL part. Based on this, we incrementally build a Boolean constraint system: we start by encoding the constraints corresponding to the LTL part and encode the HyperLTL part as variables. Those variables will then be incrementally defined when more elements of the trace become available. With this approach, we solely store the necessary information needed to detect violations of a given hyperproperty.
We evaluate two implementations of our approach, based on BDDs and SATsolving, against RVHyper [13], a highly optimized automaton-based monitoring tool for temporal hyperproperties. Our experiments show that the rewriting approach performs equally well in general and better on a class of formulas which we call guarded invariants, i.e., formulas that define a certain invariant relation between two traces.
Related Work. With the need to express temporal hyperproperties in a succinct and formal manner, the above mentioned temporal logics HyperLTL and HyperCTL* [6] have been proposed. The model-checking [6,14,15], satisfiability [9], and realizability problem [10] of HyperLTL has been studied before.
Runtime verification of HyperLTL formulas was first considered for (co-)ksafety hyperproperties [1]. In the same paper, the notion of monitorability for HyperLTL was introduced. The authors have also identified syntactic classes of HyperLTL formulas that are monitorable and they proposed a monitoring algorithm based on a progression logic expressing trace interdependencies and the composition of an LTL 3 monitor.
Another automata-based approach for monitoring HyperLTL formulas was proposed in [12]. Given a HyperLTL specification, the algorithm starts by creating a deterministic monitor automaton. For every incoming trace it is then checked that all combinations with the already seen traces are accepted by the automaton. In order to minimize the number of stored traces, a language-inclusion-based algorithm is proposed, which allows to prune traces with redundant information. Furthermore, a method to reduce the number of combination of traces which have to get checked by analyzing the specification for relations such as reflexivity, symmetry, and transitivity with a HyperLTL-SAT solver [9,11], is proposed. The algorithm is implemented in the tool RVHyper [13], which was used to monitor information-flow policies and to detect spurious dependencies in hardware designs.
Another rewriting-based monitoring approach for HyperLTL is outlined in [5]. The idea is to identify a set of propositions of interest and aggregate constraints such that inconsistencies in the constraints indicate a violation of the HyperLTL formula. While the paper describes the building blocks for such a monitoring approach with a number of examples, we have, unfortunately, not been successful in applying the algorithm to other hyperproperties of interest, such as observational determinism.
In [3], the authors study the complexity of monitoring hyperproperties. They show that the form and size of the input, as well as the formula have a significant impact on the feasibility of the monitoring process. They differentiate between several input forms and study their complexity: a set of linear traces, tree-shaped Kripke structures, and acyclic Kripke structures. For acyclic structures and alternation-free HyperLTL formulas, the problems complexity gets as low as NC.
In [4], the authors discuss examples where static analysis can be combined with runtime verification techniques to monitor HyperLTL formulas beyond the alternation-free fragment. They discuss the challenges in monitoring formulas beyond this fragment and lay the foundations towards a general method.

Preliminaries
Let AP be a finite set of atomic propositions and let Σ = 2 AP be the corresponding alphabet. An infinite trace t ∈ Σ ω is an infinite sequence over the alphabet. A subset T ⊆ Σ ω is called a trace property. A hyperproperty H ⊆ 2 (Σ ω ) is a generalization of a trace property. A finite trace t ∈ Σ + is a finite sequence over Σ. In the case of finite traces, |t| denotes the length of a trace. We use the following notation to access and manipulate traces: Let t be a trace and i be a natural number. t[i] denotes the i-th element of t. Therefore, t[0] represents the first element of the trace. Let j be natural number. If j ≥ i and i ≥ |t|, then Otherwise it denotes the empty trace . t[i denotes the suffix of t starting at position i. For two finite traces s and t, we denote their concatenation by s · t.
HyperLTL Syntax. HyperLTL [6] extends LTL with trace variables and trace quantifiers. Let V be a finite set of trace variables. The syntax of HyperLTL is given by the grammar where a ∈ AP is an atomic proposition and π ∈ V is a trace variable. Atomic propositions are indexed by trace variables. The explicit trace quantification enables us to express properties like "on all traces ϕ must hold", expressed by ∀π. ϕ. Dually, we can express "there exists a trace such that ϕ holds", expressed by ∃π. ϕ. We use the standard derived operators release ϕ R ψ := ¬(¬ϕ U ¬ψ), eventually ϕ := true U ϕ, globally ϕ := ¬ ¬ϕ, and weak until ϕ 1 W ϕ 2 := (ϕ 1 U ϕ 2 ) ∨ ϕ 1 . As we use the finite trace semantics, ϕ denotes the strong version of the next operator, i.e., if a trace ends before the satisfaction of ϕ can be determined, the satisfaction relation, defined below, evaluates to false. To enable duality in the finite trace setting, we additionally use the weak next operator ϕ which evaluates to true if a trace ends before the satisfaction of ϕ can be determined and is defined as ϕ := ¬ ¬ϕ. We call ψ of a HyperLTL formula Q.ψ, with an arbitrary quantifier prefix Q, the body of the formula. A HyperLTL formula Q.ψ is in the alternation-free fragment if either Q consists solely of universal quantifiers or solely of existential quantifiers. We also denote the respective alternation-free fragments as the ∀ n fragment and the ∃ n fragment, with n being the number of quantifiers in the prefix.
Finite Trace Semantics. We recap the finite trace semantics for HyperLTL [5] which is itself based on the finite trace semantics of LTL [18]. In the following, when using L(ϕ) we refer to the finite trace semantics of a HyperLTL formula ϕ. Let Π fin : V → Σ + be a partial function mapping trace variables to finite traces. We define [0] as the empty set. Π fin [i denotes the trace assignment that is equal to Π fin (π)[i for all π ∈ dom(Π fin ). By slight abuse of notation, we write t ∈ Π fin to access traces t in the image of Π fin . The satisfaction of a HyperLTL formula ϕ over a finite trace assignment Π fin and a set of finite traces T , denoted by Π fin T ϕ, is defined as follows: Due to duality of U/R, / , ∃/∀, and the standard Boolean operators, every HyperLTL formula ϕ can be transformed into negation normal form (NNF), i.e., for every ϕ there is some ψ in negation normal form such that for all Π fin and T it holds that Π fin T ϕ if, and only if, Π fin T ψ. The standard LTL semantic, written t LTL fin ϕ, for some LTL formula ϕ is equal to {π → t} fin ∅ ϕ , where ϕ is derived from ϕ by replacing every proposition p ∈ AP by p π .

Rewriting HyperLTL
Given the body ϕ of a ∀ 2 HyperLTL formula ∀π, π . ϕ, and a finite trace t ∈ Σ + , we define alternative language characterizations. These capture the intuitive idea that, if one fixes a finite trace t, the language of ∀π, π . ϕ includes exactly those traces t that satisfy ϕ in conjunction with t.
We callφ := ϕ ∧ ϕ[π /π, π/π ] the symmetric closure of ϕ, where ϕ[π /π, π/π ] represents the expression ϕ in which the trace variables π, π are swapped. The language of the symmetric closure, when fixing one trace variable, is equivalent to the language of ϕ. Lemma 1. Given the body ϕ of a ∀ 2 HyperLTL formula ∀π, π . ϕ, and a finite We exploit this to rewrite a ∀ 2 HyperLTL formula into an LTL formula. We define the projection ϕ| π t of the body ϕ of a ∀ 2 HyperLTL formula ∀π, π . ϕ in NNF and a finite trace t ∈ Σ + to an LTL formula recursively on the structure of ϕ: -¬a π and ¬a π are proven analogously.

Induction
Step (t = e·t * , where e ∈ Σ, t * ∈ Σ + ): The induction hypothesis states that ∀t ∈ Σ + . t ∈ L π t * (ϕ) ⇔ t LTL fin ϕ| π t * (IH). We make use of structural induction over ϕ. All cases without temporal operators are covered as their proofs above were independent of |t|. The structural induction hypothesis states for all strict subformulas ψ that ∀t ∈ Σ + . t ∈ L π 4 Constraint-based Monitoring For monitoring, we need to define an incremental rewriting that accurately models the semantics of ϕ| π t while still being able to detect violations early. To this end, we define an operation ϕ[π, e, i], where e ∈ Σ is an event and i is the current position in the trace. ϕ[π, e, i] transforms ϕ into a propositional formula, where the variables are either indexed atomic propositions p i for p ∈ AP , or a variable v − ϕ ,i+1 and v + ϕ ,i+1 that act as placeholders until new information about the trace comes in. Whenever the next event e occurs, the variables are defined with the result of ϕ [π, e , i+1]. If the trace ends, the variables are set to true and false for v + and v − , respectively. We define ϕ[π, e, i] of a ∀ 2 HyperLTL formula ∀π, π . ϕ in NNF, event e ∈ Σ, and i ≥ 0 recursively on the structure of the body ϕ: We encode a ∀ 2 HyperLTL formula and finite traces into a constraint system, which, as we will show, is satisfiable if and only if the given traces satisfy the formula w.r.t. the finite semantics of HyperLTL. We write v ϕ,i to denote either v − ϕ,i or v + ϕ,i . For e ∈ Σ and t ∈ Σ * , we define where we use v ψ,i+1 ∈ ϕ[π, e, i] to denote variables v ψ,i+1 occurring in the propositional formula ϕ[π, e, i]. enc is used to transform a trace into a propositional formula, e.g., enc 0 {a,b} ({a}{a, b}) = a 0 ∧ ¬b 0 ∧ a 1 ∧ b 1 . For n = 0 we omit the annotation, i.e., we write enc AP (t) instead of enc 0 AP (t). Also we omit the index AP if it is clear from the context. By slight abuse of notation, we use constr n (ϕ, t) for some quantifier free HyperLTL formula ϕ to denote constr(v ϕ,n , t) if |t| > 0. For a trace t ∈ Σ + , we use the notation enc(t ) constr(ϕ, t), which evaluates to true if, and only if enc(t ) ∧ constr(ϕ, t) is satisfiable. Figure 1 depicts our constraint-based algorithm. Note that this algorithm can be used in an offline and online fashion. Before we give algorithmic details, consider again, the observational determinism example from the introduction, which is expressed as ∀ 2 HyperLTL formula ∀π, π . (out π ↔ out π ) W(in π in π ). The basic idea of the algorithm is to transform the HyperLTL formula to a formula consisting partially of LTL, which expresses the requirements of the incoming trace in the current step, and partially of HyperLTL. Assuming the event {in, out}, we transform the observational determinism formula to the following formula: ¬in ∨ out ∧ ((out π ↔ out π ) W(in π in π )).

Algorithm
Input : ∀π, π . ϕ, T ⊆ Σ + Output: violation or no violation A Boolean constraint system is then build incrementally: we start encoding the constraints corresponding to the LTL part (in front of the nextoperator) and encode the Hyper-LTL part (after the next-operator) as variables that are defined when more events of the trace come in. We continue by explaining the algorithm in detail. In line 1, we construct ψ as the negation normal form of the symmetric closure of the original formula. We build two constraint systems: C containing constraints of previous traces and C t (built incrementally) containing the constraints for the current trace t. Consequently, we initialize C with and C t with v ψ,0 (lines 2 and 4). If the trace ends, we define the remaining v variables according to their polarities and add C t to C. For each new event e i in the trace t, and each "open" constraint in C t corresponding to step i, i.e., v φ,i ∈ C t , we rewrite the formula φ (line 9) and define v φ,i with the rewriting result, which, potentially introduced new open constraints v φ ,i+1 for the next step i + 1. The constraint encoding of the current trace is aggregated in constraint t enc (line 7). If the constraint system given the encoding of the current trace turns out to be unsatisfiable, a violation to the specification is detected, which is then returned.
In the following, we sketch two algorithmic improvements. First, instead of storing the constraints corresponding to traces individually, we use a new data structure, which is a tree maintaining nodes of formulas, their corresponding variables and also child nodes. Such a node corresponds to already seen rewrites. The initial node captures the (transformed) specification (similar to line 4) and it is also the root of the tree structure, representing all the generated constraints which replaces C in Fig. 1. Whenever a trace deviates in its rewrite result a new child or branch is added to the tree. If a rewrite result is already present in the node tree structure there is no need to create any new constraints nor new variables. This is crucial in case we observe many equal traces or traces behaving effectively the same. In case no new constraints were added to the constraint system, we omit a superfluous check for satisfiability.
Second, we use conjunct splitting to utilize the node tree optimization even more. We illustrate the basic idea on an example. Consider ∀π, π . ϕ with ϕ = ((a π ↔ a π )∨(b π ↔ b π )), which demands that on all executions on each position at least on of propositions a or b agree in its evaluation. Consider the two traces t 1 = {a}{a}{a}, t 2 = {a}{a, b}{a} that satisfy the specification. As both traces feature the same first event, they also share the same rewrite result for the first position. Interestingly, on the second position, we get (a ∨ ¬b) ∧ s ϕ for t 1 and (a ∨ b) ∧ s ϕ for t 2 as the rewrite results. While these constraints are no longer equal, by the nature of invariants, both feature the same subterm on the right hand side of the conjunction. We split the resulting constraint on its syntactic structure, such that we would no longer have to introduce a branch in the tree.

Correctness
In this technical subsection, we will formally prove correctness of our algorithm by showing that our incremental construction of the Boolean constraints is equisatisfiable to the HyperLTL rewriting presented in Section 3. We begin by showing that satisfiability is preserved when shifting the indices, as stated by the following lemma.
In the following lemma and corollary, we show that the semantics of the next operators matches the finite LTL semantics.
Proof. We proof this via contradiction. We choose t, t as well as ϕ arbitrarily, but in a way such that enc(t ) constr(ϕ, t) holds. Assume that there exists a continuation of t , that we call t , for which enc(t ) constr(ϕ, t) holds. So there has to exist a model assigning truth values to the variables in constr(ϕ, t), such that the constraint system is consistent. From this model we extract all assigned truths values for positional variables for position |t | to |t | − 1. As t is a prefix of t , we can use these truth values to construct a valid model for enc(t ) constr(ϕ, t), which is a contradiction. Corollary 3. For any ∀ 2 HyperLTL formula ∀π, π . ϕ in negation normal form over atomic propositions AP and any finite set of finite traces T ∈ P(Σ + ) and Proof. It holds that ∀t, t ∈ Σ + . t = t → constr(ϕ, t) = constr(ϕ, t ). Follows with same reasoning as in earlier proofs combined with Corollary 2.

Experimental Evaluation
We implemented two versions of the algorithm presented in this paper. The first implementation encodes the constraint system as a Boolean satisfiability problem (SAT), whereas the second one represents it as a (reduced ordered) binary decision diagram (BDD). The formula rewriting is implemented in a Maude [8] script.
The constraint system is solved by either CryptoMiniSat [23] or CUDD [22]. All benchmarks were executed on an Intel Core i5-6200U CPU @2.30GHz with 8GB of RAM. The set of benchmarks chosen for our evaluation is composed out of two benchmarks presented in earlier publications [12,13] plus instances of guarded invariants at which our implementations excels.
Non-interference. Non-interference [16,19] is an important information flow policy demanding that an observer of a system cannot infer any high security input of a system by observing only low security input and output. Reformulated we could also say that all low security outputs o low have to be equal on all system executions as long as the low security inputs i low of those executions are the same: ∀π, π . (o low π ↔ o low π ) W(i low π i low π ). This class of benchmarks was used to evaluated RVHyper [13], an automata-based runtime verification tool   for HyperLTL formulas. We repeated the experiments and depict the results in Fig. 2. We choose a trace length of 50 and monitored non-interference on 1000 randomly generated traces, where we distinguish between a 64 bit input (left) and an 128 bit input (right). For 64 bit input, our BDD implementation performs comparably well to RVHyper, which statically constructs a monitor automaton. For 128 bit input, RVHyper was not able to construct the automaton in reasonable time. Our implementation, however, shows promising results for this benchmark class that puts the automata-based construction to its limit.
Detecting Spurious Dependencies in Hardware Designs. The problem whether input signals influence output signals in hardware designs, was considered in [13]. Formally, we specify this property as the following HyperLTL formula: ∀π 1 ∀π 2 . (o π1 ↔ o π2 ) W(i π1 i π2 ), where i denotes all inputs except i. Intuitively, the formula asserts that for every two pairs of execution traces (π 1 , π 2 ) the value of o has to be the same until there is a difference between π 1 and π 2 in the input vector i, i.e., the inputs on which o may depend. We consider the same hardware and specifications as in [13]. The results are depicted in Table 1. Again, the BDD implementation handles this set of benchmarks well. The biggest difference can be seen between the runtimes for counter2. This is explained by the fact that this benchmark demands the highest number of observed traces, and therefore the impact of the quadratic runtime costs in the number of traces dominates the result. We can, in fact, clearly observe this correlation between the number of traces and the runtime on RVHyper's performance over all benchmarks. On the other hand our constraint-based implementations do not show this behavior.
Guarded Invariants. We consider a new class of benchmarks, called guarded invariants, which express a certain invariant relation between two traces, which are, additionally, guarded by a precondition. Fig. 3 shows the results of monitoring an arbitrary invariant P : Σ → B of the following form: ∀π, π . (∨ i∈I i π i π ) → (P (π) ↔ P (π )). Our approach significantly outperforms RVHyper on this benchmark class, as the conjunct splitting optimization, described in Section 4.1, synergizes well with SAT-solver implementations.
Atomic Proposition Scalability. While RVHyper is inherently limited in its scalability concerning formula size as the construction of the deterministic monitor automaton gets increasingly hard, the rewrite-based solution is not affected by this limitation. To put it to the test we have ran the SAT-based implementation on guarded invariant formulas with up to 100 different atomic propositions. Formulas have the form: ∀π, π . (∧ nin i=1 (in i,π ↔ in i,π )) → (∨ nout j=1 (out j,π ↔ out j,π )), where n in , n out represents the number of input and output atomic propositions, respectively. Results can be seen in Fig. 4. Note that RVHyper already fails to build monitor automata for |n in + n out | > 10.

Conclusion
We pursued the success story of rewrite-based monitors for trace properties by applying the technique to the runtime verification problem of Hyperproperties. We presented an algorithm that, given a ∀ 2 HyperLTL formula, incrementally constructs constraints that represent requirements on future traces, instead of storing traces during runtime. Our evaluation shows that our approach scales in parameters where existing automata-based approaches reach their limits.