Verified First-Order Monitoring with Recursive Rules

. First-order temporal logics and rule-based formalisms are two popular families of speciﬁcation languages for monitoring. Each family has its advantages and only few monitoring tools support their combination. We extend metric ﬁrst-order temporal logic (MFOTL) with a recursive let construct, which enables interleaving rules with temporal logic formulas. We also extend VeriMon, an MFOTL monitor whose correctness has been formally veriﬁed using the Isabelle proof assistant, to support the new construct. The extended correctness proof covers the interaction of the new construct with the existing veriﬁed algorithm, which is subtle due to the presence of the bounded future temporal operators. We demonstrate the recursive let’s usefulness on several example speciﬁcations and evaluate our veriﬁed algorithm’s performance against the DejaVu monitoring tool.


Introduction
In runtime verification, a monitor observes events generated by a running system and analyzes the event streams for compliance with a given specification.Temporal specification languages for monitoring are often classified as operational or declarative [10].Operational languages explicitly describe how the monitor's input should be transformed to obtain an output.Two important subclasses of operational languages are rule-based formalisms [2,13] and stream runtime verification (SRV) languages [6,8,11,20].Both formulate the transformations as recursive equations.In contrast, declarative languages, such as first-order temporal logics [4,15], describe the output by composing high-level operators.
Operational and declarative languages have complementary advantages: declarative languages let specification authors focus on the "what" and not the "how", whereas operational languages offer the authors more control over the evaluation.Most runtime verification tools do not support mixing the paradigms, especially when it comes to parametric, i.e., first-order, specification languages.A notable exception is the recent addition of recursive rules to past-time first-order temporal logic (PFLTL), implemented in the DejaVu monitoring tool [14].As another important benefit, recursive rules can express operations like transitive closure that are not expressible in first-order logics.
In this paper, we introduce recursion in metric first-order temporal logic (MFOTL) [4] in the form of a recursive let construct.We develop and implement an evaluation algorithm for MFOTL with recursion in VeriMon [3,21], an MFOTL monitor whose correctness has been formally verified in the Isabelle proof assistant.To this end, we extend the formal correctness proof to cover the recursive let construct.
Unlike PFLTL, MFOTL supports bounded future temporal operators and aggregations (Section 2).The interaction of recursion with bounded future operators is subtle.To avoid non-termination, DejaVu requires all recursive occurrences to be guarded by a previous operator.We similarly require the recursive occurrences to be guarded in our monitor, but we relax the requirement on the guard to other past-time operators which ensure that their subformulas are evaluated strictly in the past.Moreover, we allow future operators in the recursive let construct, as long as no recursion takes place in the future operator's arguments.These restrictions ensure that the fixpoint given by the recursive let operator is well-defined.At the same time, they are permissive and allow us to formulate interesting examples, several of which are beyond what PFLTL with recursion can express.
Consider a specification that aims to secure hosts in a network that communicate with each other and with the outside world.A host is tainted by an address range iff there is a chain of communication from the address to the host and all hosts on the chain trigger an intrusion detection alert within one hour after communicating with the previous host.This specification can be expressed directly using our recursive let construct (to model chains of communication) and future temporal operators (to specify "within one hour after").
We start by extending MFOTL with a non-recursive let operator (Section 3).This special case is mainly of pedagogical value: aspects common to both let operators are easier to explain on the simpler non-recursive variant.Yet, this construct is useful in practice to structure complex formulas and improve monitoring performance by sharing common subformulas.Thus we extend VeriMon's algorithms and proofs with the non-recursive let.
We then introduce the recursive let operator (Section 4.1), exemplify its semantics with several specifications (Section 4.2), and develop the monitoring algorithm and sketch its correctness (Section 4.3).VeriMon's repository [24] contains complete formal proofs.
This work is part of the long-term effort to develop a trustworthy monitor that surpasses in expressiveness and efficiency other non-verified tools.In this work, our focus is on expressiveness (and trustworthiness).Nonetheless, we evaluate our algorithmic additions to VeriMon on a micro-benchmark and observe that even without further optimizations it exhibits an incomparable performance to DejaVu (Section 5).Moreover, we detected a problem in DejaVu's handling of variable names in recursive subformulas.
In summary, our main contribution is the extension of MFOTL with a recursive let operator and the design of an evaluation algorithm for it.Along the way, we introduce a non-recursive let operator, which proved essential when writing complex specifications.Our contributions are implemented as part of VeriMon and proved correct using Isabelle.
Related Work.Our work adds rule-based specification features [13] to a first-order specification language [16].Above we describe our contribution's relationship to DejaVu and VeriMon, two monitors for first-order temporal specifications.VeriMon's algorithm [21], which we extend, is based on the algorithm used in the MonPoly monitor [5], although Ve-riMon has optimizations that are not present in MonPoly and vice versa [3].VeriMon supports a more expressive specification language than MonPoly, and our introduction of the recursive let has increased the gap between the two.VeriMon's and MonPoly's algorithms work with finite relations.These tools are thus restricted to MFOTL's monitorable fragment [4], which ensures that all subformulas evaluate to finite results.In contrast, DejaVu finitely represents infinite relations using BDDs and thus supports the full PFLTL (but only closed formulas).Both DejaVu and our work restrict the recursive let syntactically.Other rule-based [2,13] and SRV-based monitors [6,8,11,20] can express the temporal operators present in LTL, but struggle with extensions that introduce parameters.Even for the operators they can express, specialized algorithms that are carefully tuned for the operators tend to exhibit a better performance.Instead of encoding temporal operators, we take the opposite approach and enrich a monitor that uses specialized algorithms for temporal operators with general-purpose recursion.
Datalog [1] adds recursion to first-order logic, similarly to our addition of recursion to temporal logic.However, Datalog has no built-in notion of time and hence other measures must be taken to ensure that the fixpoints are well-defined, e.g., by restricting negation.Restricting the recursive occurrences to be strictly in the past is a natural and expressive alternative for monitoring, as we do not restrict negation beyond of what the monitorable fragment requires.Works on Datalog extensions with metric temporal operators [7,19,22] mostly study the decidability and complexity of computational problems related to these extensions, whereas we design, implement, and formally verify an executable algorithm.

Metric First-Order Temporal Logic
MFOTL extends linear temporal logic with first-order quantification, past-time operators, and interval bounds on the temporal operators [4].The VeriMon monitor [3] supports a fragment of this logic.It also adds new features, specifically regular matching operators as in linear dynamic logic [9], which results in metric first-order dynamic logic (MFODL), as well as aggregations.Our extension of VeriMon with recursive rules retains the additional features of MFODL.However, the additional features are orthogonal to our extension and hence we base our presentation in this paper on MFOTL with aggregations.
We summarize MFOTL's syntax and semantics, as well as the monitorable fragment.The presentation generally follows the Isabelle formalization; however, we sometimes deviate from Isabelle's concrete syntax for simplicity.We begin by defining some auxiliary types (top of Fig. 1).The logic's universe (type data) is fixed and infinite: it is a disjoint sum of integers, 64-bit IEEE floats, and strings of 8-bit characters.Databases (type db) encode first-order structures as functions from predicate names to relations over data.Relations are represented as sets of lists.A trace is a stream (an infinite sequence) of time-stamped databases.Time-stamps (type ts) are modeled as natural numbers (type nat).We write Γ σ i for the ith database in σ, and T σ i for its time-stamp.The predicate trace enforces monotone and eventually increasing time-stamps, i.e., ∀i ≤ j.T σ i ≤ T σ j and ∀x.∃i.x < T σ i. Non-empty intervals (type I) are represented by their end-points.We write [a, b] for the unique interval satisfying n ∈ I [a, b] iff a ≤ n ≤ b, where n ∈ I I denotes that I contains the natural number n.The interval is unbounded from above if b = ∞, which the type enat adds to the natural numbers.
Terms (type trm) are constructed recursively from variables (represented by De Bruijn indices), constants, and arithmetic operators.We use named variables in examples and omit the V and C constructors.There are two kinds of atomic formulas (type frm): flexible predicates of the form p(as), where as is a list of terms, and rigid predicates t 1 • t 2 for • ∈ {=, <, ≤}, which have a fixed interpretation.Formally, the existential quantifier ∃ does not carry a variable name because of the De Bruijn encoding.We use fv α to denote the set of De Bruijn indices of α's free variables.
The semantics is given by the functions etrm and sat (Fig. 1).Both depend on a valuation, which is a data list assigning a value to each variable.The satisfaction function sat for formulas additionally depends on a trace σ and a time-point i, which is an index into the trace.Indexing into lists is denoted by v !x, the operation z # v prepends the value z to the list v, and @ concatenates two lists.The notation {x ..< y} and {x <.. y} is shorthand for the sets {x, x + 1, . . ., y − 1} and {x + 1, x + 2, . . ., y} of natural numbers, respectively.
An aggregation formula y ← Ω(t; b) ϕ binds b variables in the subformula ϕ; the remaining free variables of ϕ are used for grouping.Each group is assigned an aggregate value y, which is computed by first evaluating the term t on each valuation that matches the group and that satisfies ϕ, then aggregating the results using the operator Ω (e.g., MIN for minimum).To this end, eval_agg_op Ω M (not shown) applies Ω to a set M of value-multiplicity pairs [3]; card ∞ Z is the cardinality of Z, or ∞ if Z is infinite.The conjunct M = {} −→ fv ϕ ⊆ {0 ..< b} ensures that the formula is satisfied by the aggregate value of an empty M only if there are no grouping variables.Otherwise, infinitely many groups would be labeled with that value, rendering such aggregations non-monitorable.
The decidable predicate mon :: frm ⇒ bool specifies the monitorable fragment.We omit its formal definition and refer to the earlier descriptions of VeriMon [3,21] for details.Intuitively, mon places restrictions on the formula's structure to ensure that all subformulas have finitely many satisfying valuations.Also, the interval I of every U I operator must be bounded.A monitor for a monitorable formula can thus compute a finite set of satisfying valuations for every time-point after observing a sufficiently long trace prefix.

Non-Recursive Let Operator
We first introduce a non-recursive let operator Let string := frm in frm to the frm datatype.The formula Let p := α in β associates the formula α with the predicate named p, which may be used in the formula β.We call such a predicate let-bound.The operator is non-recursive: p has the same meaning within α as in the surrounding context (unless it is bound by a nested let in α).Although the non-recursive let operator does not enhance MFOTL's expressiveness, it improves readability (by using descriptive let-bound predicate names), as well as modularity and evaluation efficiency (by sharing subformulas).
Intuitively, the meaning of Let p := α in β is the same as that of β after replacing all its predicates of the form p(as) with the formula α, whose free variables have been replaced with the terms as in a capture-avoiding way.The formal syntax does not specify explicitly how α's free variables map to p's arguments.The mapping is induced by the De Bruijn indices: the variable with index 0 becomes the first argument, and so forth.We list the arguments explicitly in examples that use named variables.For instance, the formula Let p(x) := p(x) ∧ ∃y.q(x, y) in [0,2] p(y) should be equivalent to [0,2] (p(y) ∧ ∃z.q(y, z)).We achieve this by defining Let's semantics as follows. sat We write satrel σ j α as an abbreviation for {v.sat σ v j α ∧ length v = nfv α}, i.e., the relation containing the valuations that satisfy α.The function nfv α returns the minimum length of v needed to cover all of α's free variables, i.e., 0 if α is closed and Max (fv α) + 1 otherwise.The trace σ[p R] is the same as the trace σ except that for every time-point i, the database at i maps the predicate name p to R i, where R has type nat ⇒ data list set and is called a temporal relation.Note that the subformula α is not necessarily evaluated at time-point i.Instead, the choice of the time-point is deferred until the predicate p is used within β, which we achieve by updating the entire trace.This supports the intuition behind unfolding the let operator Let p := α in β described above, especially as subformulas p(as) may occur under temporal operators in β.

Implementation.
To evaluate an MFOTL formula on a trace, VeriMon computes a finite set of satisfying valuations (represented by the type table) recursively for each subformula.It applies standard table operations such as the natural join ( ) and union.
Tables are sets of tuples, which are lists of optional data values (with missing values denoted by ⊥) and thus refine valuations.This representation allows us to use lists of the same length for subformulas with different free variables.As with valuations, the variables' De Bruijn indices are used to look up their value in a tuple.VeriMon processes an unbounded trace incrementally.Its interface consists of two functions init :: frm ⇒ state and step :: The function init initializes the monitor's state (type state), and step updates it with a batch of new time-stamped databases to produce a list of new satisfactions.Instead of db list, step uses the type dbs = (string table list) (a partial mapping from string to table list) to efficiently retrieve all relations (encoded as tables) associated with a predicate name at once.Besides some auxiliary data, state stores an inductive state of type sfrm that mirrors the inductive representation of formulas, augmented with data structures for evaluating temporal operators and buffering intermediate results.Internally step (dbs, tss) st calls eval j n tss dbs s ϕ , where j is the combined length of the trace prefix including the new batch, n = nfv ϕ for the monitored formula ϕ, and s ϕ is the inductive state, all stored in st.The function eval returns a list of tables with new satisfactions, as well as the updated inductive state.Satisfactions are reported for every time-point in order.They may be delayed if the formula contains future operators.
To evaluate Let p := α in β, we use the tables with α's satisfactions to evaluate p within β, which requires that the tuples in these tables do not have missing values.Therefore, we require that let operators satisfy mon (Let p := α in β) = ({0 ..< nfv α} ⊆ fv α ∧ mon α ∧ mon β).Specifically, the (indices of) α's free variables must not have gaps.We add the constructor SLet p m s α s β to the inductive state, which stores p, the number m = nfv α of free variables in α, and the states for subformulas α and β.It is initialized by initializing s α and s β recursively.The function eval evaluates it as follows.We write dbs[p → xs] for the partial mapping dbs updated at p with xs.The recursive call of eval on s α may return multiple tables in the list xs.Note that step generalizes the original VeriMon interface [3] as it consumes multiple time-stamped databases at once.The generalized interface of eval allows us to pass all tables at once to the recursive call for s β .
Correctness.We relate the outputs of step and sat to prove our monitor correct.As mentioned earlier, the monitor may delay its output.We precisely characterize its progress for a given formula and trace prefix.Intuitively, the progress is the number of time-points that the monitor is able to evaluate given a trace prefix.Progress is a useful tool in the correctness proof as it helps us describe the output at every time-point.Moreover, we show below that progress can be made arbitrarily large, which is important for completeness.Formally, prog σ P ϕ j is ϕ's progress i ϕ after reading the first j databases of trace σ.We added the partial mapping P that assigns to every let-bound predicate its own progress, i.e., the progress of the formula defining the predicate.For example, the progress of a predicate p that is not let-bound is j.Otherwise, it is equal to the progress of the formula it is bound to (stored in P p).The progress of α U [a,b] β is the smallest i such that The invariant invar σ j P n s ϕ ϕ relates an inductive state s ϕ to the formula ϕ.The inductive state must reflect the monitor's state after processing the first j databases in the trace σ, assuming that P specifies the let-bound predicates' progress.The parameter n is the length of the tuples stored within s ϕ .The invariant is defined inductively over s ϕ ; we reuse VeriMon's definition for the MFOTL operators and add a case for Let: The first two premises restrict the subformula states s α and s β , where s β reflects the evaluation of β on the modified trace, and p's progress is that of α.The premise m = nfv α enforces that m is equal to p's arity, and {0 ..< m} ⊆ fv α is the constraint from mon.
Our extensions preserve the monitor's correctness: we formally proved the theorem below, which characterizes the monitor's eval function.The theorem is stated here for the empty progress mapping ∅, which must be generalized in the proof (as P changes in the above rule).Let δ be a natural number and ϕ be a monitorable formula with n = nfv ϕ.The function the maps the optional value x to x and ⊥ to some unspecified value.
Soundness follows immediately from Thm. 1, whereas completeness additionally requires the aforementioned property that any progress can be reached by making the trace prefix long enough, which we also proved for our modified progress function: Theorem 2. If mon ϕ, then for all i there exists a j such that prog σ ∅ ϕ j ≥ i.

Past-Recursive Let Operator
It is well-known that first-order logic (FOL) cannot express certain queries, notably the transitive closure of a binary relation.This remains true when restricted to finite structures [18].Although MFOTL is rather different from ordinary FOL, we conjecture that it cannot express transitive closure either.This hampers its ability to model hierarchies of unbounded depth.Moreover, recursive patterns are sometimes the most natural way to express certain specifications.We describe an extension of MFOTL that can encode a "temporally directed" form of transitive closure and other recursive patterns.
Specifically, we introduce another let operator in which the predicate may refer to itself recursively.The intended semantics is that of a fixpoint, i.e., the predicate p defined by a formula α should be interpreted by a temporal relation that is equal to the evaluation of α under that interpretation of p.The fixpoint might not always exist or it might not be unique.Therefore, different fixpoint operators have been studied in the context of nontemporal logics and query languages [1].For instance, it is common to require that all recursive occurrences of p in its defining formula are positive, i.e., under an even number of negations.This ensures monotonicity and hence the existence of a least fixpoint.
MFOTL's future operators are interpreted over infinite traces.This poses a new challenge for monitoring recursively defined predicates, even if we restrict our attention to positive formulas.Consider the recursive definition of p by q ∨ [0,∞] p, where q is a predicate from the trace.Although q ∨ [0,∞] p is monitorable (at most one additional timepoint must be known to evaluate it), the recursive definition of p is equivalent to ♦ [0,∞] q under the least fixpoint semantics.However, ♦ [0,∞] q is not monitorable, as one might need the entire, infinite trace to evaluate it.Therefore, we focus on a fragment where every recursive occurence of p must be strictly in the past.This guarantees a unique fixpoint even if the defining formula is not monotone, so the predicate may occur negatively as well.
The syntax of our past-recursive let operator is similar to the one of Let: we add the constructor LetPast string := frm in frm to the frm datatype.However, the semantics is different (Section 4.1).The restriction to strictly past recursion is enforced by a syntactic monitorability condition that is checked by mon.Consider the formula LetPast p := α in β.Intuitively, every recursive occurrence of p in α must be guarded by at least one strictly past operator, and there must be no future operator on the path from the occurrence to α's root.We do allow future operators in the other parts of α, though.
We give examples of LetPast in Section 4.2.The evaluation of LetPast requires an extension of VeriMon's algorithm (Section 4.3), which we also formally prove correct.

Semantics
The semantics of the past-recursive let operator is defined by the equation We evaluate β at the same time-point i as the recursive let operator using an appropriately updated trace.The temporal relation assigned to p is computed by the combinator recp: The argument f is a function that transforms temporal relations, and recp f returns again a temporal relation.Intuitively, recp f evaluates to the fixpoint f (recp f ), except that f R i can only access time-points of R before i.For all other time-points j ≥ i, the relation R j is empty.The combinator recp is well-defined because i is a natural number; the recursive call recp f j affects the result only if j < i and hence we can prove termination using i as a variant.For the semantics of LetPast, we choose f R i = satrel (σ[p R]) i α, i.e., the satisfactions of α with p mapped to f 's argument R, to which recp supplies the result of the recursive evaluation (up to but excluding i).
Our definition of sat is total: it gives meaning to every formula.This includes formulas LetPast p := α in β where p occurs in α without a past guard or under a future operator.However, the semantics behaves unexpectedly in such cases.For example, LetPast p := (q∨ [0,∞] p) in p is equivalent to q.Our monitor therefore requires properly guarded formulas.Not only does this avoid confusion about the semantics, it also simplifies the implementation because the monitor need not eliminate unguarded occurrences.
Next, we describe the formalization of the syntactic restriction.The idea is to determine for every predicate whether it is used strictly in the past by analyzing the formula recursively.The datatype recSafety (Fig. 2) represents the possible outcomes.U(nused) means that a predicate does not occur in the formula.P(ast) means that it is evaluated at strictly earlier time-points, whereas NF (Non-Future) additionally allows the current time-point.A(ny) covers all remaining cases.The linear order < on recSafety is induced by U < P < NF < A. Its reflexive closure ≤ corresponds to implication.For example, if the predicate p is unused (U), it is clearly evaluated at earlier time-points only (P).The least upper bound x y with respect to ≤ corresponds to logical disjunction.
The function slp p ϕ (Fig. 2) analyzes the past-guardedness of a predicate p in a formula ϕ.It uses a composition operator y * x on recSafety.The patterns in the definition of * should be matched sequentially from top to bottom; e.g., A * U is equal to U. Intuitively, y * x describes the guardedness of a predicate that is x-used in some subformula, which is then y-used.For example, slp p ( I ϕ) = P * slp p ϕ because ϕ and all occurences of p therein are evaluated at time-points that are strictly in the past relative to I ϕ.Note that we make a case distinction for α S I β: if the interval I excludes zero, β is always evaluated strictly in the past.Future operators always result in A if p is used in an operand.
Finally, we define the mon predicate for the recursive let operator: The only difference to Let is the restriction of p's occurrences in α via slp, which is generally an over-approximation.For example, slp p ( I I I p) = A even though p is evaluated at strictly earlier time-points.Therefore, some instances of LetPast that our algorithm could evaluate correctly are not considered to satisfy mon.We plan to replace recSafety with a more precise lattice in future work.

Examples
Temporal Operators.We first show that the non-metric S operator can be reduced to LetPast and .(We omit the interval subscripts if the interval is [0, ∞].)Using the special ts(t) predicate, which is true iff t is the current time-stamp, we can also express the metric version.This example serves to gently illustrate the semantics of LetPast.In general, formulas are more readable if they are directly expressed in terms of S, and monitoring can be more efficient.Below we give further examples in which LetPast adds expressiveness.Let α and β be two monitorable MFOTL formulas with free variables fv α and fv β, respectively.The formula α S β is monitorable only if fv α ⊆ fv β, so let us assume that, too.The following unfolding of S's semantics is well-known: As the unfolding recursively evaluates the formula at the previous time-point, we can directly translate it into a recursive let operator: ϕ S ≡ LetPast s(x) := ψ in s(x), where ψ ≡ β ∨ (α ∧ s(x)).The predicate name s must be fresh, i.e., it must not occur in α nor β.The variable list x enumerates fv β.The formula ϕ S is monitorable because s(x) is clearly past-guarded, and hence slp s ψ = P. (We also need fv β = {0 ..< nfv β}, which can be achieved by renaming variables in α and β.) Let us analyze the semantics of ϕ S : These equations hold for all valuations v of length nfv β and if the variables x are ordered by their De Bruijn indices.
Step ( * ) exploits the freshness of s with respect to α and β, which allows us to replace σ[s . . .] by σ.The equations result in the same unfolding as (1).Hence, we can prove the semantic equivalence of ϕ S and α S β by induction on i.
Here, t and u are fresh variables, where t records the time-stamp of the past satisfaction of β, whereas u is the time-stamp at which we evaluate SinceLet.The subformula which is part of S [a,b] 's semantics (Fig. 1).
Temporally-Directed Transitive Closure.We proceed by showing that LetPast can compute a temporally-directed transitive closure over events observed at a sequence of distinct time-points.Hence, we assume that the trace contains a single event at every time-point.The closure is directed in the sense that the transitive chains can only be extended by newer events.We consider the following two types of events from [14]: r(y, x, d) denotes that process y reports some data d to another process x, and s(x, y) denotes that process x spawns process y.The Spawn formula encodes violations of the property that whenever process y sends some data d to a process x, denoted as r(y, x, d), then there was a chain of process spawns: s(x, x 1 ), s(x 1 , x 2 ), . . ., s(x k , y), occurring in this order in the trace.In other words, a process may only send data to its "ancestors".To check this property, a monitor needs to compute the (temporallydirected) transitive closure p(u, v) of the relation s.The definition of the closure has two recursive predicate instances with different arguments.The Spawn formula is inspired by a similar one used to evaluate the DejaVu monitor [14].Unlike DejaVu, we do not require the formula to be closed and thus leave the variables x, y, and d free.
The Trans formula encodes violations of the same property as Spawn even if s(x, x 1 ), s(x 1 , x 2 ), . . ., s(x k , y) are received by the monitor out-of-order, i.e., they do not occur in this order in the trace.We can interpret the events s(x, y) as edges in a directed graph and the predicate p(x, y) in Trans as computing the reachability of vertices in the directed graph.We also extend the directed edges s(x, y) with a weight w to s + (x, y, w).Then the yields all pairs of vertices x, y and the length w of the shortest path from x to y whenever y becomes reachable from x or the length of the shortest path changes.The relation s + (x, y, w) can itself be obtained by evaluating a more complex temporal formula, e.g., s + (x, y, w) ≡ e(x, y, w) ∧ ¬ ♦ [0,10] d(x, y) with the following two types of events: e(x, y, w) denotes an edge from x to y with weight w; d(x, y) denotes deletion of the edge from x to y.The eventually operator ♦ I ϕ abbreviates (∃x.x = x) U I ϕ.Such a relation s + (x, y, w) contains all edges that are not revoked within 10 time units after receiving e(x, y, w).We could use the non-recursive let operator Let s + (x, y, w) := e(x, y, w) ∧ ¬ ♦ [0,10] d(x, y) to precompute the relation and use it when evaluating the recursive let operator in Trans + .As another application of future operators under LetPast, recall our introductory example.Suppose that hosts in a network communicate with each other and with the outside world: comm(src, dest) indicates that host src sends a message to host dest; in(r, h) and out(h, r) indicate that the host h receives or sends traffic from or to an IP address in the range r, respectively.The hosts are equipped with an intrusion detection system (IDS), whose alerts are denoted by ids(h).We say that a host h is tainted by an address range r iff there is a chain of communication from r to h and all hosts on the chain (including h) trigger an IDS alert within one hour after communicating with the previous host.The formula is true whenever a host communicates back to the IP range by which it was tainted.
Periodic Behavior.Suppose that we monitor a boolean signal b(x), parametrized by an integer parameter x, between the user's start(x) and stop(x) commands.An arbitrary amount of time may pass between these two commands.Our task is to detect periodic activations of b(x), with a fixed period t > 0 and error tolerance 0 ≤ ε < t.We shall ignore positive noise in b(x), i.e., additional activations besides the periodic ones.
Let us make the task more precise.An alarm must be raised at time-point i n iff there exist time-points i 0 < i 1 < • • • < i n such that start(x) holds at i 0 , stop(x) holds at i n , and b(x) holds at all i k for 1 ≤ k ≤ n − 1.Moreover, the difference of time-stamps for adjacent time-points i k and i k+1 , where 1 ≤ k ≤ n − 2, must be in the interval [t − ε, t + ε]; the differences for the pairs i 0 , i 1 and i n−1 , i n must each be at most t + ε.
Our first attempt PB to formalize the alarm condition without recursion is where and K ϕ abbreviates (∃x.x = x) S K ϕ.This formula follows an inductive approach: every b(x) between start(x) and stop(x) must be preceded by b(x) or start(x), with the appropriate time difference.However, PB does not ignore noise, as adding b(x) events to the trace may silence an alarm.For example, let t = 10, ε = 0, and σ be a trace starting with ({start(1)}, 0), ({b(1)}, 10), ({stop(1)}, 20).We write {p(1), p(2)} for the database where the predicate p holds for 1 and 2. On σ, PB is true at the third time-point.Inserting a database {b(1)} with time-stamp 15 falsifies PB at the now fourth time-point, although the trace still satisfies the natural language description.The following PBLet formula expresses the intended condition using LetPast: This example depends crucially on the flexible past guards we support: here, the recursion goes through with an interval constraint.Note that 0 ∈ J because we assumed ε < t.
As another example of periodic behavior, we analyze an integer-valued signal(y) between the (now non-parametric) commands start and stop.We aim to discover whether signal(y) is piecewise constant, with the constant segments being exactly t time units long.Moreover, the signal's values for subsequent segments must differ by at most δ.The next formula uses the general S operator as the recursion guard to capture this property.
Turing Machines.Every MFOTL formula can be viewed as a function on traces, where the function's output is the set of satisfying valuations, either at a fixed or at all timepoints.VeriMon's monitorable fragment guarantees that one can compute the valuation at every time-point.Thus, monitorable formulas correspond to computable functions.If we give up on the requirement that the function's output must be available at a fixed timepoint, the past-recursive let operator is expressive enough to simulate arbitrary Turing machines (TM).This is not a contradiction: we simulate a single TM step at every time-point, and there is an infinite supply of time-points.Running the monitor on a configuration that does not halt will never produce an output, i.e., a nonempty set of satisfying valuations.
Let M = Σ, b, Q, q 0 , q f , δ be a deterministic TM with tape alphabet Σ, blank symbol b ∈ Σ, control states Q, initial state q 0 ∈ Q, final state q f ∈ Q, and transition function δ ∈ (Q × Σ → Q × Σ × {−1, 0, 1}).Whenever the machine is in state q 1 and reads the symbol s 1 , it enters state q 2 , writes the symbol s 2 , and moves the head by m tape cells to the right, where δ(q 1 , s 1 ) = q 2 , s 2 , m .Without loss of generality, we assume that Σ and Q are finite subsets of the integers.We simulate M using the formula ϕ M shown below.
LetPast cfg(q, i, s) := Let cfg(q, i, s) := cfg(q, i, s) in Let head(q, s) := cfg(q, 0, s) ∨ ¬(∃x, z. cfg(x, 0, z)) ∧ (∃y, z. cfg(q, y, z) The idea is that cfg represents the current configuration of the TM.Specifically, cfg(q, i, s) holds if the machine is in control state q and the tape contains the symbol s in the ith cell to the right of the head (i may be negative).Note that we use nested, non-recursive let operators to abbreviate repeated subformulas.In the body of Let cfg(q, i, s) := cfg(q, i, s) in . . ., the predicate cfg refers to the previous configuration.The predicate head provides the current state and the symbol under the head.Its definition extends the tape by a blank symbol if necessary.The simulation is started at time-point 0 by providing the tape's initial content in the predicate input, which must include the cell input(0, s 0 ) with the symbol s 0 under the head's initial position.If and only if M halts on this input, there exists a time-point i at which ϕ M is satisfied by at least one valuation (i, s).Moreover, the satisfying valuations at i represent the final state of the tape.

Algorithm
The restriction to past-guarded recursion allows for an efficient evaluation algorithm for LetPast formulas.It is efficient because no fixpoint iteration is required at individual time-points.To evaluate LetPast p := α in β, we first try to evaluate α for as many timepoints as possible and then use the results to interpret p in β.This part is the same as for the non-recursive Let, but the evaluation of α itself differs.The syntactic monitorability condition guarantees that α at time-point i depends on the predicate p only for timepoints strictly less than i.Specifically, we have defined mon (LetPast p := α in β) such that the progress of α's evaluation does not depend on p's progress beyond time-point i − 1.Therefore, we can evaluate α at time-point 0 without providing any table for p, then use the result to evaluate α at time-point 1, and so forth.
There are two cases that require care.First, if α contains future operators, multiple time-points may be evaluated at once.The above process must then be repeated within a single monitor step.Second, if α contains no future operators, α is evaluated at all timepoints i < j, where j is the current trace prefix length.We could then attempt to evaluate α once more at time-point j using the table computed at j−1 for p.However, this would not yield any further tables because all occurrences of p are below at least one past operator that tries to access the time-stamp at time-point j, which is not yet known.Therefore, this last evaluation attempt would needlessly traverse the formula state.We optimize this case and buffer α's result at time-point j − 1 until the next input database arrives.
It is crucial that the evaluation of a recursive let does not get stuck waiting for tables that it needs to produce itself.Therefore, all operators that are strictly past-guarding as defined by slp (Fig. 2) must be well-behaved: the evaluation algorithm must compute a result at time-point i < j even if the operands' results are available only for time-points i < i.In particular, S I without 0 in the interval is considered strictly past-guarding.We have modified VeriMon's evaluation algorithm for α S I β to achieve this behavior.
The inductive state SLetPast p m s α s β i buf for a recursive let operator extends SLet with a counter i :: nat, which tracks the progress of p as observed by s α , and an optional buffer buf :: table option.The meaning of the other arguments is the same as for SLet.In the initial state, i is zero and buf is ⊥.Let the function list_opt map ⊥ to [] and x to [x], where x is the embedding of x into the option type.A single monitor step updates the state as follows (see Section 3 for a description of eval's interface): The heavy lifting is performed by eval LP , which is mutually recursive with eval.We forward relevant variables from eval.The accumulator xs :: table list collects s α 's results.
First, eval LP evaluates s α with dbs updated at p using the current buffer, which may be empty.Since i tracks p's progress, we then increase its new value i by the length of buf .The evaluation results in a list xs of tables and a new state s α .We continue to iterate eval LP only if two conditions are met: xs must be nonempty, as otherwise there is no new data to evaluate s α on, and i + 1 must be less than the current input prefix length.The latter condition serves as an obvious termination criterion, although it is stricter than necessary.We could perform an additional iteration in the case that i + 1 = j.However, such an iteration would never produce new results because the past operators guarding p can only be evaluated further if there are new time-stamps.Therefore, we optimize this case by choosing the stricter condition.If we continue the iteration, we append xs to the accumulator xs.Moreover, we clear tss and dbs because all tables from the new input database have already been processed by the first call to eval.Specifically, the function clear_dbs dbs updates dbs at all points at which it is defined to an empty list.
We illustrate our algorithm with an example, tracing the computations of eval and eval LP .We evaluate LetPast p(x) := q(x) ∨ p(x) in p(x), which has the same semantics as [0,∞] q(x), on a prefix with two time-points at time-stamps 0 and 3. We omit details about the subformulas' states, as well as brackets around singleton lists, i.e., [1] is displayed as 1.Let dbs 0 = {q → [{1}, {2}]} be the content of the trace prefix.
Correctness.We extended the correctness proof of eval (Thm. 1) to cover the new state constructor SLetPast.The added case differs from the one for the non-recursive let in that eval LP is used to evaluate the first subformula.The proof also required additional invariants for the i and buf arguments of SLetPast, as well as a characterization of LetPast's progress.Recall that progress describes the number of time-points that the monitor is able to evaluate given a trace prefix of length j.We express the progress of the let-bound predicate p, which is defined in terms of α, as a least fixpoint: prog LP σ P p α j = {i.i = prog σ (P[p → i]) α j} prog σ P (LetPast p := α in β) j = prog σ (P[p → prog LP σ P p α j]) β j (We do not update σ in these definitions as progress depends only on the time-stamp sequence but not on the databases in σ.)The above characterization follows the iteration in eval LP : Since prog is pointwise monotone in P and at most j (both facts we prove in the formalization), the fixpoint can be reached by iteratively computing prog σ (P[p → i]) α j starting with i = 0. Similarly, eval LP starts by evaluating α with no data for p and it feeds the results back into the evaluation until no further results can be obtained.Theorem 2 remains true after adding the above equation to prog.differently from those in the rules' usages.After renaming the variables in the let-bound predicates of these two formulas, the issue was fixed and we restarted the experiments.
The evaluation results (Figure 3) show that DejaVu's performance is incomparable to VeriMon's.VeriMon outperforms DejaVu on the formulas Once and OnceLet and scales well on PBLet, which, together with the Trans + formula, we could not express in PFLTL with recursion.DejaVu outperforms VeriMon on the Spawn and Trans formulas for which VeriMon's time complexity of processing one event is linear in the trace length because the number N of valuations satisfying the recursive predicates grows linearly in the trace length and the time complexity of updating the recursive predicate is linear in N. We conjecture based on some preliminary experiments that VeriMon's performance can be significantly improved by optimizing the representation of sets of tuples in two ways: (a) using tuples of a fixed length with a fixed assignment of variables to positions in a tuple (i.e., no De Bruijn indices); (b) using a collection of indices to optimize the computation of joins on various sets of shared columns.Nevertheless, processing one event can unlikely be made trace-length independent: Trans encodes the incremental dynamic transitive closure graph problem, with the best known algorithm processing every new edge in the input in amortized linear time (in the graph's maximum out-degree) [23].

Conclusion
We have presented the extension of a monitor for MFOTL with non-recursive and pastrecursive let operators.The presence of bounded future temporal operators complicates both the semantics and the evaluation algorithms for the new constructs, compared to earlier unverified extensions of past-only monitors [14].Yet, the formal correctness proofs that we have carried out ensure the trustworthiness of our development.
As future work we plan to improve the performance of evaluating expensive joins by introducing indices, as used in database management systems.Expressiveness-wise we will consider further relaxing the requirements on the recursive let.We can omit the past guard if we define a Datalog-style fragment for which the fixpoint is well-defined.Beyond relaxing guards, we may want to allow recursion through future operators in certain situations.The main challenge is that this would make the progress notion data-dependent (unlike currently, where it only depends on the time-stamps).

Fig. 3 .
Fig. 3. Execution times of the monitors in seconds (TO = timeout of 120 seconds)