Symbolic Monitoring against Specifications Parametric in Time and Data

Monitoring consists in deciding whether a log meets a given specification. In this work, we propose an automata-based formalism to monitor logs in the form of actions associated with time stamps and arbitrarily data values over infinite domains. Our formalism uses both timing parameters and data parameters, and is able to output answers symbolic in these parameters and in the log segments where the property is satisfied or violated. We implemented our approach in an ad-hoc prototype SyMon, and experiments show that its high expressive power still allows for efficient online monitoring.


Introduction
Monitoring consists in checking whether a sequence of data (a log or a signal) satisfies or violates a specification expressed using some formalism. Offline monitoring consists in performing this analysis after the system execution, as the technique has access to the entire log in order to decide whether the specification is violated. In contrast, online monitoring can make a decision earlier, ideally as soon as a witness of the violation of the specification is encountered.
Using existing formalisms (e. g., the metric first order temporal logic [BKMZ15b]), one can check whether a given bank customer withdraws more than 1,000 e every week. With formalisms extended with data, one may even identify such customers. Or, using an extension of the signal temporal logic (STL) [BDSV14], one can ask: "is that true that the value of variable x is always copied to y exactly 4 time units later?" However, questions relating time and data using parameters become much harder (or even impossible) to express using existing formalisms: "what are the users and time frames during which a user withdraws more than half of the total bank withdrawals within seven days?" And even, can we synthesize the durations (not necessarily 7 days) for which this specification holds? Or "what is the set of variables for which there exists a duration within which their value is always copied to another variable?" In addition, detecting periodic behaviors without knowing the period can be hard to achieve using existing formalisms.
In this work, we address the challenging problem to monitor logs enriched with both timing information and (infinite domain) data. In addition, we significantly push the existing limits of expressiveness so as to allow for a further level of abstraction using parameters: our specification can be both parametric in the time and in the data. The answer to this symbolic monitoring is richer than a pure Boolean answer, as it synthesizes the values of both time and data parameters for which the specification holds. This allows us notably to detect periodic behaviors without knowing the period while being symbolic in terms of data. For example, we can synthesize variable names (data) and delays for which variables will have their value copied to another data within the aforementioned delay. In addition, we show that we can detect the log segments (start and end date) for which a specification holds.
Example 1. Consider a system updating three variables a, b and c (i. e., strings) to values (rationals). An example of log is given in Fig. 1a. Although our work is event-based, we can give a graphical representation similar to that of signals in Fig. 1b. Consider the following property: "for any variable px, whenever an update of that variable occurs, then within strictly less than tp time units, the value of variable b must be equal to that update". In our formalism, a simple automaton made of 4 locations (given in Fig. 1c) can monitor this property. The variable parameter px is compared with string values and the timing parameter tp is used in the timing constraints. We are interested in checking for which values of the variable parameter px and the timing parameter tp this property is violated. This can be seen as a synthesis problem in both the variable and timing parameters. For example, px = c and tp = 1.5 is a violation of the specification, as the update of c to 2 at time 4 is not propagated to b within 1.5 time unit. Our algorithm outputs such violation by a constraint e. g., px = c ∧ tp ≤ 2. In contrast, the value of any signal at any time is always such that either b is equal to that signal, or the value of b will be equal to that value within at most 2 time units. Thus, the specification holds for any valuation of the variable parameter px, provided tp > 2.
We believe our framework balances expressiveness and monitoring performance well: 1. Regarding expressiveness, comparison with the existing work is summarized in Table 1 (see Section 2 for further details). 2. Our monitoring is complete, in the sense that it returns a symbolic constraint characterizing all the parameter valuations that match a given specification. 3. We also achieve reasonable monitoring speed, especially given the degree of parametrization in our formalism.
Note that it is not easy to formally claim superiority in expressiveness: proofs would require arguments such as the pumping lemma; and such formal comparison does not seem to be a concern of the existing work. Moreover, such formal comparison bears little importance for industrial practitioners: expressivity via an elaborate encoding is hardly of practical use. We also note that, in the existing work, we often observe gaps between the formalism in a theory and the formalism that the resulting tool actually accepts. This is not the case with the current framework.
Outline After discussing related works in Section 2, we introduce the necessary preliminaries in Section 3, and our parametric timed data automata in Section 4. We present our symbolic monitoring approach in Section 5 and conduct experiments in Section 6. We conclude in Section 7.

Related works
Robustness and monitoring Robust (or quantitative) monitoring extends the binary question whether a log satisfies a specification by asking "by how much" the specification is satisfied. The quantification of the distance between a signal and a signal temporal logic (STL) specification has been addressed in, e. g., [FP09,DM10,Don10,DFM13,DMP17,JBG + 18] (or in a slightly different setting in [ALFS11]). The distance can be understood in terms of space ("signals") or time. In [ABD18], the distance also copes for reordering of events. In [BFMU17], the robust pattern matching problem is considered over signal regular expressions, by quantifying the distance between the signal regular expression specification and the segments of the signal. For piecewise-constant and piecewise-linear signals, the problem can be effectively solved using a finite union of convex polyhedra. While our framework does not fit in robust monitoring, we can simulate both the robustness w.r.t. time (using timing parameters) and w.r.t. data, e. g., signal values (using data parameters).
Monitoring with data The tool MarQ [RCR15] performs monitoring using Quantified Event Automata (QEA) [BFH + 12]. This approach and ours share the automata-based framework, the ability to express some first-order properties using "events containing data" (which we encode using local variables associated with actions), and data may be quantified. However, [RCR15] does not seem to natively support specification parametric in time; in addition, [RCR15] does not perform complete ("symbolic") parameters synthesis, but outputs the violating entries of the log. The metric first order temporal logic (MFOTL) allows for a high expressiveness by allowing universal and existential quantification over data-which can be seen as a way to express parameters. A monitoring algorithm is presented for a safety fragment of MFOTL in [BKMZ15b]. Aggregation operators are added in [BKMZ15a], allowing to compute sums or maximums over data. A fragment of this logics is implemented in MonPoly [BKZ17]. While these works are highly expressive, they do not natively consider timing parameters; in addition, MonPoly does not output symbolic answers, i. e., symbolic conditions on the parameters to ensure validity of the formula.
In [HPU17], binary decision diagrams (BDDs) are used in order to symbolically represent the observed data in QTL. This can be seen as monitoring data against a parametric specification, with a symbolic internal encoding (the BDDs of [HPU17,HP18] work efficiently for comparing whether a variable is equal or not equal to another, but not for comparing whether a variable is smaller than another one-which suits strings better than rationals). However, their implementation DejaVu only outputs concrete answers. In contrast, we are able to provide symbolic answers (both in timing and data parameters), e. g., in the form of union of polyhedra for rationals, and unions of string constraints using equalities (=) and inequalities ( =).
Freeze operator In [BDSV14], the STL logic is extended with a freeze operator that can "remember" the value of a signal, to compare it to a later value of the same signal. This logic STL * can express properties such as "In the initial 10 seconds, x copies the values of y within a delay of 4 seconds": G [0,10] * (G [0,4] y * = x). While the setting is somehow different (STL * operates over signals while we operate over timed data words), the requirements such as the one above can easily be encoded in our framework. In addition, we are able to synthesize the delay within which the values are always copied, as in Example 1. In contrast, it is not possible to determine using STL * which variables and which delays satisfy or violate the specification.
Monitoring with parameters In [ADMN11], a log in the form of a dense-time real-valued signal is tested against a parameterized extension of STL, where parameters can be used to model uncertainty both in signal values and in timing values. The output comes in the form of a subset of the parameters space for which the formula holds on the log. In [BFM18], the focus is only on signal parameters, with an improved efficiency by reusing techniques from the robust monitoring. Whereas [ADMN11,BFM18] fit in the framework of signals and temporal logics while we fit in words and automata, our work shares similarities with [ADMN11,BFM18] in the sense that we can express data parameters; in addition, [BFM18] is able as in our work to exhibit the segment of the log associated with the parameters valuations for which the specification holds. A main difference however is that we can use memory and aggregation, thanks to arithmetic on variables.
In [FR08], the problem of inferring temporal logic formulae with constraints that hold in a given numerical data time series is addressed. The method is applied to biological systems.
Timed pattern matching A recent line of work is that of timed pattern matching, that takes as input a log and a specification, and decides where in the log the specification is satisfied or violated. On the one hand, a line of works considers signals, with specifications either in the form of timed regular expressions [UFAM14,UFAM16,Ulu17,BFN + 18], or a temporal logic [UM18]. On the other hand, a line of works considers timed words, with specifications in the form of timed automata [WHS17,AHW18]. We will see that our work can also encode parametric timed pattern matching. Therefore, our work can be seen as a two-dimensional extension of both lines of works: first, we add timing parameters (note that [AHW18] also considers similar timing parameters) and, second, we add data-themselves extended with parameters. That is, coming back to Example 1, [UFAM14,UFAM16,Ulu17,WHS17] could only infer the segments of the log for which the property is violated for a given (fixed) variable and a given (fixed) timing parameter; while [AHW18] could infer both the segments of the log and the timing parameter valuations, but not which variable violates the specification.
Summary We compare related works with our work in Table 1. "Timing parameters" denote the ability to synthesize unknown constants used in timing constraints (e. g., modalities intervals, or clock constraints). "?" denotes works not natively supporting this, although it might be encoded. The term "Data" refers to the ability to manage logs over infinite domains (apart from timestamps). For example, the log in Fig. 1a features, beyond timestamps, both string (variable name) and rationals (value). Also, works based on real-valued signals are naturally able to manage (at least one type of) data. "Parametric data" refer to the ability to express formulas where data (including signal values) are compared to (quantified or unquantified) variables or unknown parameters; for example, in the log in Fig. 1a, an example of property parametric in data is to synthesize the parameters for which the difference of values between two consecutive updates of variable px is always below pv, where px is a string parameter and pv a rational-valued parameter. "Memory" is the ability to remember past data; this can be achieved using e. g., the freeze operator of STL * , or variables (e. g., in [RCR15,BKMZ15b,HPU17]). "Aggregation" is the ability to aggregate data using operators such as sum or maximum; this allows to express properties such as "A user must not withdraw more than $10,000 within a 31 day period" [BKMZ15a]. This can be supported using dedicated aggregation operators [BKMZ15a] or using variables ([RCR15], and our work). "Complete parameter identification" denotes the synthesis of the set of parameters that satisfy or violate the property. Here, "N/A" denotes the absence of parameter [BDSV14], or when parameters are used in a way (existentially or universally quantified) such as the identification is not explicit (instead, the position of the log where the property is violated is returned [HPU17]). In contrast, we return in a symbolic manner (as in [ADMN11,AHW18]) the exact set of (data and timing) parameters for which a property is satisfied. " √ /×" denotes "yes" in the theory paper, but not in the associated tool.

Clocks, timing parameters and timed guards
We assume a set C = {c 1 , . . . , c H } of clocks, i. e., real-valued variables that evolve at the same rate. A clock valuation is a function ν : C → R ≥0 . We write 0 for the clock valuation assigning 0 to all clocks.
We assume a set TP = {tp 1 , . . . , tp J } of timing parameters, i. e., unknown timing constants. A timing parameter valuation γ is a function γ : TP → Q + . 1 We assume ⊲⊳ ∈ {<, ≤, =, ≥, >}. A timed guard tg is a constraint over C ∪ TP defined by a conjunction of inequalities of the form c ⊲⊳ d, or c ⊲⊳ tp with d ∈ N and tp ∈ TP. Given tg, we write ν |= γ(tg) if the expression obtained by replacing each c with ν(c) and each tp with γ(tp) in tg evaluates to true.

Variables, data parameters and data guards
For sake of simplicity, we assume a single infinite domain D for data. The formalism defined in Section 4 can be extended in a straightforward manner to different domains for different variables (and our implementation SyMon does allow for different types). The case of finite data domain is immediate too. We however define this formalism in an abstract manner, so as to allow a sort of parameterized domain.
We assume a set V = {v 1 , . . . , v M } of variables valued over D. These variables are internal variables, that allow an high expressive power in our framework, as they can be compared or updated to other variables or parameters. We also assume a set LV = {lv 1 , . . . , lv O } of local variables valued over D. These variables will only be used locally along a transition in the "argument" of the action (e. g., x and v in upate(x, v)), and in the associated guard and (right-hand part of) updates. We assume a set VP = {vp 1 , . . . , vp N } of data parameters, i. e., unknown variable constants.
A data type (D, DE, DU) is made of 1. an infinite domain D, 2. a set of admissible Boolean expressions DE (that may rely on V, LV and VP), which will define the type of guards over variables in our subsequent automata, and 3. a domain for updates DU (that may rely on V, LV and VP), which will define the type of updates of variables in our subsequent automata.
Example 2. As a first example, let us define the data type for rationals. We have D = Q. Let us define Boolean expressions. A rational comparison is a constraint over V ∪ LV ∪ VP defined by a conjunction of inequalities of the form DE is the set of all rational comparisons over V∪LV∪VP. Let us then define updates. First, a linear arithmetic expression over denote the set of arithmetic expressions over V, LV and VP. We then have DU = LA(V ∪ LV ∪ VP).
As a second example, let us define the data type for strings. We have D = S, where S denotes the set of all strings. A string comparison is a constraint over , a string variable can be assigned another string variable, or a concrete string.
A variable valuation is a function µ : V → D. A local variable valuation is a partial function η : LV D. A data parameter valuation ζ is a function ζ : VP → D. Given a data guard dg ∈ DE, a variable valuation µ, a local variable valuation η defined for the local variables in dg, and a data parameter valuation ζ, we write (µ, η) |= ζ(dg) if the expression obtained by replacing within dg all occurrences of each data parameter vp i by ζ(vp i ) and all occurrences of each variable v j (resp. local variable lv k ) with its concrete valuation µ(v j ) (resp. η(lv k ))) evaluates to true. A parametric data update is a partial function PDU : V DU. That is, we can assign to a variable an expression over data parameters and other variables, according to the data type. Given a parametric data update PDU, a variable valuation µ, a local variable valuation η (defined for all local variables appearing in PDU), and a data parameter valuation ζ, we define [µ] η(ζ(PDU)) : V → D as follows: where η(µ(ζ(PDU(v)))) denotes the replacement within the update expression PDU(v) of all occurrences of each data parameter vp i by ζ(vp i ), and all occurrences of each variable v j (resp. local variable lv k ) with its concrete valuation µ(v j ) (resp. η(lv k )). Observe that this replacement gives a value in D, therefore the result of [µ] η(ζ(PDU)) is indeed a data parameter valuation V → D. That is, [µ] η(ζ(PDU)) computes the new (non-parametric) variable valuation obtained after applying to µ the partial function PDU valuated with ζ.
Example 3. Consider the data type for rationals, the variables set {v 1 , v 2 }, the local variables set {lv 1 , lv 2 } and the parameters set {vp 1 }. Let µ be the variable valuation such that µ(v 1 ) = 1 and µ(v 2 ) = 2, and η be the local variable valuation such that η(lv 1 ) = 2 and η(lv 2 ) is not defined. Let ζ be the data parameter valuation such that ζ(vp 1 ) = 1. Consider the parametric data update function PDU such that PDU(

Parametric timed data automata
We introduce here Parametric timed data automata (PTDAs). They can be seen as an extension of parametric timed automata [AHV93] (that extend timed automata [AD94] with parameters in place of integer constants) with unbounded data variables and parametric variables. PTDAs can also be seen as an extension of some extensions of timed automata with data (see e. g., [BER94, Dan03,Qua15]), that we again extend with both data parameters and timing parameters. Or as an extension of quantified event automata [BFH + 12] with explicit time representation using clocks, and further augmented with timing parameters. PTDAs feature both timed guards and data guards; we summarize the various variables and parameters types together with their notations in Table 2.

Syntax
We will associate local variables with actions (which can be see as predicates).
Let Dom : Σ → 2 LV denote the set of local variables associated with each action. Let Var(dg) (resp. Var(PDU)) denote the set of variables occurring in dg (resp. PDU).  ℓ1 ℓ2 Given a data parameter valuation ζ and a timing parameter valuation γ, we denote by γ|ζ(A) the resulting timed data automaton (TDA), i. e., the nonparametric structure where all occurrences of a parameter vp i (resp. tp j ) have been replaced by ζ(vp i ) (resp. γ(tp j )).

Semantics
We now equip our TDAs with a concrete semantics.
A finite run is accepting if its last state (ℓ, µ, ν) is such that ℓ ∈ F . The language L(γ|ζ(A)) is defined to be the set of timed data words associated with all accepting runs of γ|ζ(A).
The associated timed data word is (open, 2046, η 0 ), (open, 2136, η 1 ), (close, 2166, η 2 ). Since each action is associated with a set of local variables, given an ordering on this set, it is possible to see a given action and a variable valuation as a predicate: for example, assuming an ordering of LV such as f precedes m, then open with η 0 can be represented as open(Hakuchi.txt, rw). Using this convention, the log in Fig. 2a corresponds exactly to this timed data word.

Symbolic monitoring against PTDA specifications
In symbolic monitoring, in addition to the (observable) actions in Σ, we employ unobservable actions denoted by ε and satisfying Dom(ε) = ∅. We write Σ ε for Σ ⊔{ε}. We let η ε be the local variable valuation such that η ε (lv ) is undefined for any lv ∈ LV. For a timed data word w = (a 1 , τ 1 , η 1 ), (a 2 , τ 2 , η 2 ), . . . , (a n , τ n , η n ) over Σ ε , the projection w↓ Σ is the timed data word over Σ obtained from w by removing any triple (a i , τ i , η i ) where a i = ε. An edge e = (ℓ, tg, dg, a, R, PDU, ℓ ′ ) ∈ E is unobservable if a = ε, and observable otherwise. The use of unobservable actions makes symbolic monitoring more general, and allows us in particular to encode parametric timed pattern matching (see Section 5.3).
We make the following assumption on the PTDAs in symbolic monitoring.

Problem definition
Roughly speaking, given a PTDA A and a timed data word w, the symbolic monitoring problem asks for the set of pairs (γ, ζ) ∈ (Q + ) TP × D VP satisfying w(1, i) ∈ γ|ζ(A), where w(1, i) is a prefix of w. Since A also contains unobservable edges, we consider w ′ which is w augmented by unobservable actions.
Example 7. Consider the PTDA A and the timed data word w shown in Fig. 1.
For the data types in Example 2, the validity domain D(w, A) can be represented by a constraint of finite size because the length |w| of the timed data word is finite.

Online algorithm
Our algorithm is online in the sense that it outputs (γ, ζ) ∈ D(w, A) as soon as its membership is witnessed, even before reading the whole timed data word w.
Algorithm 1 shows an outline of our algorithm for symbolic monitoring (see Appendix A for the full version). Our algorithm incrementally computes Conf u i−1 and Conf o i (line 3). After reading (a i , τ i , η i ), our algorithm stores the partial results (γ, ζ) ∈ D(w, A) witnessed from the accepting configurations in Conf u i−1 and Conf o i (line 4). (We also need to try to take potential unobservable transitions and store the results from the accepting configurations after the last element of the timed data word (lines 5 and 6).) Since (Q + ) TP ×D VP is an infinite set, we cannot try each (γ, ζ) ∈ (Q + ) TP ×D VP and we use a symbolic representation for parameter valuations. Similarly to the reachability synthesis of parametric timed automata [JLR15], a set of clock and timing parameter valuations can be represented by a convex polyhedron. For variable valuations and data parameter valuations, we need an appropriate representation depending on the data type (D, DE, DU). Moreover, for the termination of Algorithm 1, some operations on the symbolic representation are required.
Theorem 1 (termination). For any PTDA A over a data type (D, DE, DU ) and actions Σ ε , and for any timed data word w over Σ, Algorithm 1 terminates if the following operations on the symbolic representation V d of a set of variable and data parameter valuations terminate.

restriction and update
η is a local variable valuation, PDU is a parametric data update function, and dg is a data guard; 2. emptiness checking of V d ; 3. projection V d ↓ VP of V d to the data parameters VP.
⊓ ⊔ Example 8. For the data type for rationals in Example 2, variable and data parameter valuations V d can be represented by convex polyhedra and the above operations terminate. For the data type for strings S in Example 2, variable and data parameter valuations V d can be represented by S |V| × (S ∪ P fin (S)) |VP| and the above operations terminate, where P fin (S) is the set of finite sets of S.

Encoding parametric timed pattern matching
The symbolic monitoring problem is a generalization of the parametric timed pattern matching problem of [AHW18]. Recall that parametric timed pattern matching aims at synthesizing timing parameter valuations and start and end times in the log for which a log segment satisfies or violates a specification. In our approach, by adding a clock measuring the absolute time, and two timing parameters encoding respectively the start and end date of the segment, one can easily infer the log segments for which the property is satisfied. We note that even with Assumption 1, symbolic monitoring is still a generalization of parametric timed pattern matching. Consider the Dominant PTDA (left of Fig. 3). It is inspired by a monitoring of withdrawals from bank accounts of various users [BKZ17]. This PTDA monitors situations when a user withdraws more than half of the total withdrawals within a time window of (50, 100). The actions are Σ = {withdraw} and Dom(withdraw) = {n, a}, where n has a string value and a has an integer value. The string n represents a user name and the integer a represents the amount of the withdrawal by the user n. Observe that clock c is never reset, and therefore measures absolute time. The automaton can non-deterministically remain in ℓ 0 , or start to measure a log by taking the ε-transition to ℓ 1 checking c = tp 1 , and therefore "remembering" the start time using timing parameter tp 1 . Then, whenever a user vp has withdrawn more than half of the accumulated withdrawals (data guard 2v 1 > v 2 ) in a (50, 100) time window (timed guard c − tp 1 ∈ (50, 100)), the automaton takes a ε-transition to the accepting location, checking c = tp 2 , and therefore remembering the end time using timing parameter tp 2 .

Experiments
We implemented our symbolic monitoring algorithm in a tool SyMon in C++ (compiled using GCC 7.3.0), where the domain for data is the strings and the integers. 4 For the strings, we used the data type in Example 2 and for integers, we used the data type for the rationals in Example 2, where any occurrences of Q are replaced by Z. Our tool SyMon is distributed at https://github.com/MasWag/symon. We use the Parma Polyhedra Library (PPL) [BHZ08] for the symbolic representation of the valuations. We note that we employ an optimization to merge adjacent polyhedra in the configurations if possible. 5 We evaluated our monitor algorithm against three original benchmarks: the PTDA in Copy is in Fig. 1c; and the PTDAs in Dominant and Periodic are shown in Fig. 3. We conducted the experiments on an Amazon EC2 c4.large instance (2.9 GHz Intel Xeon E5-2666 v3, 2 vCPUs, and 3.75 GiB RAM) that runs Ubuntu 18.04 LTS (64 bit).

Benchmark 1: Copy
Our first benchmark Copy is a monitoring of variable updates much like the scenario in [BDSV14]. The actions are Σ = {update} and Dom(update) = {n, v}, where n has a string value representing the name of the updated variables and v has an integer value representing the updated value. We generated random timed data words of various sizes. Our set W consists of 10 timed data words of length 4,000 to 40,000.
The PTDA in Copy is shown in Fig. 1c, where we give an additional constraint 3 < tp < 10 on tp. The property encoded in Fig. 1c is "for any variable px, whenever an update of that variable occurs, then within tp time units, the value of b must be equal to that update".
The experiment result is in Fig. 4. We observe that the execution time is linear to the number of the events and the memory usage is more or less constant with respect to the number of events.

Benchmark 2: Dominant
Our second benchmark is Dominant (Fig. 3 left). We generated random timed data words of various sizes, where the number of users is 3 and the duration between each withdrawal follows the uniform distribution on {1, 2, . . . , 10}. Our set W consists of 10 timed data words of length 2,000 to 20,000. Recall that this PTDA matches a situation when the amount of the withdrawal by the user vp in a certain time window is more than the half of the withdrawals by all of the users in the same time window. The time window must be between 50 and 100. The parameters tp 1 and tp 2 show the beginning and the end of the time window respectively. The experiment result is in Fig. 5. We observe that the execution time is linear to the number of the events and the memory usage is more or less constant with respect to the number of events.

Benchmark 3: Periodic
Our third benchmark Periodic is inspired by a parameter identification of periodic withdrawals from one bank account. The actions are Σ = {withdraw} and Dom(withdraw) = {a}, where a has an integer value representing the amount of the withdrawal. We randomly generated a set W consisting of 10 timed data words of length 2,000 to 20,000. Each timed data word consists of the following three kinds of periodic withdrawals: short period One withdrawal occurs every 5 ± 1 time units. The amount of the withdrawal is 50 ± 3. middle period One withdrawal occurs every 50 ± 3 time units. The amount of the withdrawal is 1000 ± 40. long period One withdrawal occurs every 100 ± 5 time units. The amount of the withdrawal is 5000 ± 20. The PTDA in Periodic is shown in the right of Fig. 3. The PTDA matches situations where, for any two successive withdrawals of amount more than vp, the duration between them is within [tp 1 , tp 2 ]. By the symbolic monitoring, one can identify the period of the periodic withdrawals of amount greater than vp is in [tp 1 , tp 2 ]. An example of the validity domain is shown in the right figure.
The experiment result is in Fig. 5. We observe that the execution time is linear to the number of the events and the memory usage is more or less constant with respect to the number of events.

Discussion
First, a positive result is that our algorithm effectively performs symbolic monitoring on more than 10,000 actions in one or two minutes even though the PTDAs feature both timing and data parameters. The execution time in Copy is 50-100 times smaller than that in Dominant and Periodic. This is because the constraint 3 < tp < 10 in Copy is strict and the size of the configurations (i. e., Conf o i and Conf u i in Algorithm 1) is small. Another positive result is that in all of the benchmarks (Copy, Dominant, and Periodic), the execution time is linear and the memory usage is more or less constant in the size of the input word. This is because the size of configurations (i. e., Conf o i and Conf u i in Algorithm 1) is bounded due to the following reason. In Dominant, the loop in ℓ 1 of the PTDA is deterministic, and because of the guard c − tp 1 ∈ (50, 100) in the edge from ℓ 1 to ℓ 2 , the number of the loop edges at ℓ 1 in an accepting run is bounded (if the duration between two continuing actions are bounded as in the current setting). Therefore, |Conf o i | and |Conf u i | in Algorithm 1 are bounded. The reason is similar in Copy, too. In Periodic, since the PTDA is deterministic and the valuations of the amount of the withdrawals are in finite number, |Conf o i | and |Conf u i | in Algorithm 1 are bounded. It is clear that we can design ad-hoc automata for which the execution time of symbolic monitoring can grow much faster (e. g., exponential in the size of input word). However, experiments showed that our algorithm monitors various interesting properties in a reasonable time.
Copy and Dominant use data and timing parameters as well as memory and aggregation; from Table 1, no other monitoring tool can compute the valuations satisfying the specification. We however used the parametric timed model checker IMITATOR [AFKS12] to try to perform such a synthesis, by encoding the input log as a separate automaton; but IMITATOR ran out of memory (on a 3.75 GiB RAM computer) for Dominant with |w| = 2000, while SyMon terminates in 14 s with only 6.9 MiB for the same benchmark. Concerning Periodic, the only existing work that can possibly accommodate this specification is [ADMN11]. While the precise performance comparison is interesting future work (their implementation is not publicly available), we do not expect our implementation be vastly outperformed: in [ADMN11], their tool times out (after 10 min.) for a simple specification ("E [0,s2] G [0,s1] (x < p)") and a signal discretized by only 128 points.
For those problem instances which MonPoly and DejaVu can accommodate (which are simpler and less parametrized than our benchmarks), they tend to run much faster than ours. For example, in [HPU17], it is reported that they can process a trace of length 1,100,004 in 30.3 seconds. The trade-off here is expressivity: for example, DejaVu does not seem to accommodate Dominant, because DejaVu does not allow for aggregation. We also note that, while SyMon can be slower than MonPoly and DejaVu, it is fast enough for many scenarios of real-world online monitoring.

Conclusion and perspectives
Conclusion We proposed a symbolic framework for monitoring using parameters both in data and time. Logs can use timestamps and infinite domain data, while our monitor automata can use timing and variable parameters (in addition to clocks and local variables). In addition, our online algorithm can answer symbolically, by outputting all valuations (and possibly log segments) for which the specification is satisfied or violated. We implemented our approach into a prototype SyMon and experiments showed that our tool can effectively monitor logs of dozens of thousands of events in a short time.
Perspectives Combining the BDDs used in [HPU17] with some of our data types (typically strings) could improve our approach by making it even more symbolic. Also, taking advantage of the polarity of some parameters (typically the timing parameters, in the line of [BL09]) could improve further the efficiency.
We only considered infinite domains, but the case of finite domains raises interesting questions concerning result representation: if the answer to a property on the log of Fig. 1a is "neither a nor b", knowing the domain is {a, b, c}, then the answer should be c.
From a usability point of view, adding some syntactic improvements to the PTDAs will help further the ease of using by non-experts (for example allowing "update(¬b, )" without guard instead of the self-loop over ℓ 0 in Fig. 1c).

A Details on our algorithm for symbolic monitoring
Intuition Intuitively, for each prefix w(1, i) of w and (γ, ζ) ∈ (Q + ) TP × D VP , our algorithm checks whether w ∈ L(γ|ζ(A)) by a breadth-first search. However, obviously we cannot try each (γ, ζ) ∈ (Q + ) TP × D VP because (Q + ) TP × D VP is an infinite set. Moreover, we have to add unobservable actions to the timed data word w, where the timestamps and the number of unobservable actions are unknown. Therefore, we symbolically represent parameter valuations (γ, ζ) ∈ (Q + ) TP × D VP and concrete states (ℓ, ν, µ) ∈ L × (R ≥0 ) C × D V . The procedure is much like the reachability synthesis of parametric timed automata [JLR15]. At first, we take all the parameter valuations (γ, ζ) ∈ (Q + ) TP × D VP as candidates of D(w, A). Then we try each edge by a breadth-first search. After each edge, we constrain the parameter valuations by the guards, and finally we obtain D(w, A).

Notations
In the pseudocode, we use V t , V t+ , and V d for symbolic representation of valuations: V t is a set of pairs (ν, γ) ∈ (R ≥0 ) C × (Q + ) TP of a clock valuation and a time parameter valuation; V t+ is a set of triples (ν, γ, t) ∈ (R ≥0 ) C × (Q + ) TP × R ≥0 of a clock valuation, a time parameter valuation, and an elapsed time; and V d is a set of pairs (µ, ζ) ∈ D V × D VP of a variable valuation and a data parameter valuation. We also use CurrConf , NextConf , and CurrUConf : CurrConf and NextConf are finite sets of triples (ℓ, V t , V d ) and CurrUConf is a finite set of triples (ℓ, V t+ , V d ), where ℓ ∈ L is a location and V t , V d , and V t+ are as shown in the above. For V t ⊆ (R ≥0 ) C × (Q + ) TP and t ∈ R ≥0 , we Algorithm 2 is a pseudocode of our algorithm for symbolic monitoring. In line 1 of Algorithm 2, we set the current configurations CurrConf to be the triple (ℓ 0 , {0} × (Q + ) TP , {µ 0 } × D VP ), which means we are at the initial location ℓ 0 , the clock (resp. variable) valuation is the initial valuation 0 (resp. µ 0 ), and the timing (resp. data) parameter valuations can by any valuations (Q + ) TP (resp. D VP ). In lines 3 to 15, we try unobservable transitions. In line 3, we set the current configurations CurrUConf for the unobservable transitions, which is essentially the same as CurrConf , but each V t is equipped with the time elapse after the latest observable transition. The elapsed time t is used 1. to restrict the unobservable transitions between the last observable action a i−1 and the next observable action a i (line 7) and 2. to make the time elapse to τ i (line 13).
For (ℓ, V t+ , V d ) ∈ CurrUConf , after time elapse in line 7, we try unobservable edges from ℓ (lines 8 to 15). We constrain the valuations (V t+ , V d ) by the guards (tg and dg) and conduct the reset and update in lines 9 and 10. If (V t+ , V d ) satisfies the guards, we add the valuations (V ′ t+ , V ′ d ) and the valuations after time elapse to CurrUConf and NextConf , respectively. Moreover, if ℓ ′ ∈ F , we add the parameter valuations (V ′ t+ ↓ TP , V ′ d ↓ VP ) to Result. After trying the unobservable edges, in lines 16 to 25, we try observable edges. Finally, we try unobservable edges after the whole timed data word in lines 27 to 36. The explanation of lines 16 to 25 and lines 27 to 36 is essentially similar to that of lines 3 to 15.
Theorem 2 (correctness). For any PTDA A over a data type (D, DE, DU ) and actions Σ ε , and for any timed data word w over Σ, if Algorithm 2 terminates, we have Result = D(w, A) after the execution of Algorithm 2.
⊓ ⊔ Optimization In our implementation, we also employ an optimization to merge adjacent polyhedra in the configurations NextConf if possible. Precisely, we merge (ℓ, V t , V d ) and (ℓ ′ , V ′ t , V ′ d ) in NextConf whenever we have the following: ℓ and ℓ ′ are the same.
-V t and V ′ t are the same. Such a merge is conducted after consuming each entry (a i , τ i , η i ) of the timed word w i. e., in line 26 of Algorithm 2. Table 3 shows the detailed results of our experiments.