Keywords

figure a
figure b

1 Introduction

Cyber-physical systems (CPS), that embed cyber technologies into physical systems, have been widely deployed in safety-critical domains, such as transportation, healthcare, power and energy. Due to their safety-critical nature, the behaviors of CPS require formal verification to guarantee their satisfaction to formal specifications that are usually expressed in temporal logics, e.g., Signal Temporal Logic (STL) [20]. Given an STL specification, monitoring (a.k.a. runtime verification) [2] is an effective approach for checking whether a trace of system execution satisfies the specification.

Monitoring can be achieved either offline or online. In STL monitoring, an offline monitor can report a real value (called robustness) that indicates how robustly an STL formula \(\varphi \) is satisfied or violated by a complete execution trace, based on the STL robust semantics [9, 13]. By contrast, an online monitor targets partial execution traces at runtime, reporting the satisfaction of an STL formula \(\varphi \) by the partial trace so far at each time instant. Due to the lack of a complete trace, a typical online monitor, e.g., the robust monitor in [6], reports a robustness interval \([\mathrm {[R]}^\textsf{L},\mathrm {[R]}^\textsf{U} ]\) telling the possibly reachable values of the robustness under any suffix trace, where \(\mathrm {[R]}^\textsf{L}\) and \(\mathrm {[R]}^\textsf{U}\) are the lower and upper bounds respectively. In this way, the satisfaction of \(\varphi \) can be inferred from the computed robustness interval, e.g., \(\varphi \) is violated if \(\mathrm {[R]}^\textsf{U} \) is negative.

Fig. 1.
figure 1

An illustrative example of online robust monitoring and causation monitoring of STL formula \(\varphi : \ \Box _{[0,45]}(\Diamond _{[0,5]}v<10)\).

Online robust monitoring [6] suffers from the information masking problem [24, 26, 27], as visualized by the example in Fig. 1. The specification in this example requires that, during [0, 45], the speed v of a car should never be over 10 for 5 time units. The top plot reports a trace v that violates the system specification. When using the classic online monitor [6] reported in the middle plot, the value of \(\mathrm {[R]}^\textsf{U}\) keeps on decreasing and becomes negative at \(b=10\). It then reaches the minimum value around \(b=14\) and stagnates at that value in the remaining of the monitoring. As a result, the system evolution is not faithfully reflected by the monitor. For instance, the monitor does not deliver that the system actually recovers (i.e., \(v<10\)) at \(b=20\) and that the status \(v>10\) persisting for more than 5 time units happens once again from \(b = 25\).

Causation monitoring [26] emerges to tackle the information masking issue of robust monitoring. As shown in the bottom plot of Fig. 1, at each instant b, an online causation monitor returns \(\mathcal {[\mathscr {R}]}^\ominus \) (called a violation causation distance) and \(\mathcal {[\mathscr {R}]}^\oplus \) (called a satisfaction causation distance), that respectively reflect how far the trace value at b is from being a causation to the violation and the satisfaction of the specification. For instance, \(\mathcal {[\mathscr {R}]}^\ominus \) at \(b = 12\) is negative, which implies that the trace value at \(b = 12\) is considered as a causation to the violation of the partial trace, because the status \(v>10\) that has been persisting for more than 5 time units continues at \(b=12\). At \(b= 20\), \(\mathcal {[\mathscr {R}]}^\ominus \) becomes positive, which implies that the trace value at \(b = 20\) is no more a causation to the violation, because v becomes less than 10 at \(b= 20\). From Fig. 1, we can see that compared to robust monitoring, causation monitoring can reflect more information about system evolution, such as the recovery of the system at \(b = 20\) and the recurrence of the status \(v>10\) persisting for more than 5 time units from \(b = 25\).

Contributions. In this paper, we present a tool CauMon, that implements an efficient online causation monitoring algorithm of STL. Compared to the plain monitoring algorithm [26] derived directly from the definition of causation monitoring, our algorithm features the use of two dynamic programming strategies, by which we can significantly reduce redundant computation of the causation distances during monitoring of system executions. We adopt dynamic programming for two purposes: first, we record and reuse the intermediate monitoring results of the sub-formulas using several worklists for each of the sub-formulas, such that computational cost is only spent for incremental results, not for existing ones; moreover, we adapt a sliding window algorithm [18] to accelerate the computation of monitoring results in the presence of nested temporal operators.

We implement CauMon in C++, and it can be compiled to an executable that can be easily interfaced with CPS in different formalisms. We demonstrate the advantages of CauMon in informativeness, by comparing it with the existing online robust monitor in [6]. Moreover, we also evaluate the efficiency of CauMon, by comparing it with the plain causation monitor derived from the definition of causation monitoring directly, and also the online robust monitor [6]. The experimental results show that CauMon can indeed deliver more information about system evolution compared to the robust monitor; moreover, while the plain implementation of causation monitoring is not applicable to handling nested temporal operators in practice, CauMon can significantly reduce the monitoring time costs and achieve comparable efficiency with the robust monitor.

Related Work. Online monitoring is an approach that monitors system executions at runtime, and different approaches have been proposed for different temporal logics, such as LTL [4, 5], MTL [14, 19, 25], and STL [6, 7, 15, 16, 23]. For online monitoring of STL, most of existing approaches [6, 7, 15, 16] and tools [1, 8, 22, 23] are based on its robust semantics and provide a quantitative value or interval to characterize the system runtime status. Consequently, these approaches suffer more or less from the issue of information masking. In [27], we proposed a reset mechanism that resets a monitor whenever it detects the recovery of specification violation. However, [27] does not propose new semantics and so it does not improve informativeness of monitors between two resets.

2 Preliminaries

2.1 Signal Temporal Logic

Let \(T\in \mathbb {R}_{+}\) be a positive real. A signal (i.e., a trace of system execution) is a function \(\textbf{v} :[0,T]\rightarrow \mathbb {R}^ d \), where T is the time horizon, \( d \in \mathbb {N}_{+}\) is the dimension. In practice, each signal dimension concerns with a signal variable that has a certain physical meaning, e.g., \(\texttt{speed}\), \(\texttt{RPM}\) of a car. We fix a set \(\textbf{Var}\) of variables and assume that a signal \(\textbf{v} \) is spatially bounded by a hyper-rectangle \(\Omega \), i.e., for any \(t\in [0, T]\), \(\textbf{v} (t)\in \Omega \).

Signal temporal logic (STL) [20] can express desired properties of hybrid systems. We review the syntax and robust semantics [9, 13] of STL.

Definition 1

(STL Syntax). In STL, the atomic propositions \(\alpha \) and the formulas \(\varphi \) are respectively defined as follows:

$$\begin{aligned} \begin{aligned} \alpha \,::\equiv \, f(w_1, \dots , w_K) > 0 \qquad \varphi \,::\equiv \, \alpha \mid \bot \mid \lnot \varphi \mid \varphi \wedge \varphi \mid \Box _{I}\varphi \mid \Diamond _{I}\varphi \mid \varphi \mathbin {\mathcal {U}_{I}} \varphi \end{aligned} \end{aligned}$$

Here f is a K-ary function \(f:\mathbb {R}^K \rightarrow \mathbb {R}\), \(w_1, \dots , w_K \in \textbf{Var}\), and I is a closed non-singular interval in \(\mathbb {R}_{\ge 0}\), i.e., \(I=[l,u]\), where \(l,u \in \mathbb {R}\) and \(l<u\). \(\Box , \Diamond \) and \(\mathcal {U}\) are temporal operators, which are known as always, eventually and until, respectively. The always operator \(\Box \) and eventually operator \(\Diamond \) are two special cases of the until operator \(\mathcal {U}\), where \(\Diamond _{I}\varphi \equiv \top \mathbin {\mathcal {U}_{I}}\varphi \) and \(\Box _{I}\varphi \equiv \lnot \Diamond _{I}\lnot \varphi \). Other common connectives such as \(\vee , \rightarrow \) are introduced as syntactic sugar: \(\varphi _1\vee \varphi _2\equiv \lnot (\lnot \varphi _1\wedge \lnot \varphi _2)\), \(\varphi _1\rightarrow \varphi _2 \equiv \lnot \varphi _1 \vee \varphi _2\).

Definition 2

(STL Robust Semantics). Let \(\textbf{v} \) be a signal, \(\varphi \) be an STL formula and \(\tau \in \mathbb {R}_{+}\) be an instant. The robustness \(\textrm{R}(\textbf{v}, \varphi , \tau ) \in \mathbb {R}\cup \{+\infty ,-\infty \}\) of \(\textbf{v} \) w.r.t. \(\varphi \) at \(\tau \) is defined by induction on the construction of formulas, as follows,

$$\begin{aligned} &\textrm{R}(\textbf{v}, \alpha , \tau ) \;:=\; f(\textbf{v} (\tau )) \qquad \textrm{R}(\textbf{v}, \lnot \varphi , \tau ) \;:=\; -\textrm{R}(\textbf{v}, \varphi , \tau ) \\ &\textrm{R}(\textbf{v}, \varphi _1\wedge \varphi _2, \tau ) \;:=\; \min \left( \textrm{R}(\textbf{v}, \varphi _1, \tau ), \textrm{R}(\textbf{v}, \varphi _2, \tau )\right) \\ &\textstyle \textrm{R}(\textbf{v}, \Box _{I}\varphi , \tau ) \;:=\; \inf _{t\in \tau + I}{\textrm{R}(\textbf{v}, \varphi , t)} \qquad \textrm{R}(\textbf{v}, \Diamond _{I}\varphi , \tau ) \;:=\; \sup _{t\in \tau + I}{\textrm{R}(\textbf{v}, \varphi , t)} \\ &\textstyle \textrm{R}(\textbf{v}, \varphi _1 \mathbin {\mathcal {U}_{I}} \varphi _2, \tau ) \;:=\; \sup _{t \in \tau + I} \min \left( \textrm{R}(\textbf{v}, \varphi _2, t), \inf _{t' \in [\tau , t)} \textrm{R}(\textbf{v}, \varphi _1, t') \right) \end{aligned}$$

where \(\tau + [l, u]\) denotes the shifted interval \([l+\tau , u+\tau ]\).

The Boolean semantics of STL, i.e., whether \((\textbf{v}, \tau ) \models \varphi \) or not, can be inferred from the quantitative robust semantics in Definition 2, namely, if \(\textrm{R}(\textbf{v}, \varphi , \tau ) > 0\), it implies \((\textbf{v}, \tau )\models \varphi \); and if \(\textrm{R}(\textbf{v}, \varphi , \tau ) < 0\), it implies \((\textbf{v}, \tau )\not \models \varphi \).

2.2 Online Robust Monitoring of STL

Online monitoring concerns the satisfaction of a partial signal \(\textbf{v}_{0:b}: [0, b]\rightarrow \mathbb {R}^ d \) w.r.t. an STL formula \(\varphi \). We define a completion of \(\textbf{v}_{0:b} \) as a signal \(\textbf{v}: [0,T]\rightarrow \mathbb {R}^ d \) (\(b\le T\)) such that \(\forall t\in [0, b], \textbf{v} (t) = \textbf{v}_{0:b} (t)\). A completion \(\textbf{v} \) can be written as the concatenation of \(\textbf{v}_{0:b}\) with a suffix signal \(\textbf{v}_{b:T}\), i.e., \(\textbf{v} =\textbf{v}_{0:b} \cdot \textbf{v}_{b:T} \).

Definition 3

(Online Robust Monitor [6]). Let \(\textbf{v}_{0:b} \) be a partial signal, and let \(\varphi \) be an STL formula. We denote by \(\texttt{R}^{\alpha }_{\textrm{max}} \) and \(\texttt{R}^{\alpha }_{\textrm{min}} \) the possible maximum and minimum bounds of the robustness \(\textrm{R}(\textbf{v}, \alpha , \tau )\)Footnote 1. Then, an online robust monitor returns a sub-interval \(\mathrm {[R]}(\textbf{v}_{0:b}, \varphi , \tau ) \subseteq [\texttt{R}^{\alpha }_{\textrm{min}}, \texttt{R}^{\alpha }_{\textrm{max}} ]\) at instant b, which is defined as follows, by induction on the construction of formulas.

$$\begin{aligned} &\mathrm {[R]}(\textbf{v}_{0:b}, \alpha , \tau ) \;:=\; {\left\{ \begin{array}{ll} \big [f\left( \textbf{v}_{0:b} (\tau )\right) , f\left( \textbf{v}_{0:b} (\tau )\right) \big ] & \text {if } \tau \in [0, b]\\ \big [\texttt{R}^{\alpha }_{\textrm{min}}, \texttt{R}^{\alpha }_{\textrm{max}} \big ]& \text {otherwise} \end{array}\right. }\\ &\mathrm {[R]}(\textbf{v}_{0:b}, \lnot \varphi , \tau ) \;:=\; - \mathrm {[R]}(\textbf{v}_{0:b}, \varphi , \tau ) \\ &\mathrm {[R]}(\textbf{v}_{0:b}, \varphi _1\wedge \varphi _2, \tau ) \;:=\; \min \Big (\mathrm {[R]}(\textbf{v}_{0:b}, \varphi _1, \tau ), \mathrm {[R]}(\textbf{v}_{0:b}, \varphi _2, \tau )\Big ) \\ &\textstyle \mathrm {[R]}(\textbf{v}_{0:b}, \Box _I\varphi , \tau ) \;:=\; \inf _{t\in \tau +I}\Big (\mathrm {[R]}(\textbf{v}_{0:b}, \varphi , t)\Big ) \\ &\textstyle \mathrm {[R]}(\textbf{v}_{0:b}, \varphi _1\mathbin {\mathcal {U}_{I}}\varphi _2, \tau ) \;:=\; \sup _{t\in \tau +I}\min \left( \mathrm {[R]}(\textbf{v}_{0:b}, \varphi _2, t), \inf _{t'\in [\tau , t)}\mathrm {[R]}(\textbf{v}_{0:b}, \varphi _1, t') \right) \end{aligned}$$

Here, f is defined as in Definition 1, and the arithmetic rules over intervals \(I=[l, u]\) are defined as follows: \(-I:=[-u, -l] \text { and } \min (I_1,I_2) :=[\min (l_1, l_2), \min (u_1, u_2)]\).

We denote by \(\mathrm {[R]}^\textsf{U}(\textbf{v}_{0:b}, \varphi , \tau )\) and \(\mathrm {[R]}^\textsf{L}(\textbf{v}_{0:b}, \varphi , \tau )\) the upper and lower bounds of \(\mathrm {[R]}(\textbf{v}_{0:b}, \varphi , \tau )\) respectively. Intuitively, this interval \(\mathrm {[R]}(\textbf{v}_{0:b}, \varphi , \tau )\) indicates the set of robustness values possibly reached by the completion of \(\textbf{v}_{0:b} \), under any suffix signal \(\textbf{v}_{b:T} \). This interval can be used to derive a 3-valued verdict for a given \(\textbf{v}_{0:b} \), that signifies the satisfaction of \(\textbf{v}_{0:b} \) w.r.t. the specification \(\varphi \): if \(\mathrm {[R]}^\textsf{L}(\textbf{v}_{0:b}, \varphi , \tau ) > 0\), it implies \(\texttt{true}\), i.e., \(\textbf{v}_{0:b} \) satisfies \(\varphi \); if \(\mathrm {[R]}^\textsf{U}(\textbf{v}_{0:b}, \varphi , \tau ) < 0\), it implies \(\texttt{false}\), i.e., \(\textbf{v}_{0:b} \) violates \(\varphi \); otherwise, it returns \(\texttt{unknown}\).

3 Overview of Causation Monitoring

As mentioned in §1, the information masking issue of online robust monitors has been identified as a problem in [24, 26, 27]. The problem arises from the monotonicity of online robust monitors, i.e., during evolution of the signal \(\textbf{v}_{0:b} \), \(\mathrm {[R]}^\textsf{U}(\textbf{v}_{0:b}, \varphi , \tau )\) monotonically decreases and \(\mathrm {[R]}^\textsf{L}(\textbf{v}_{0:b}, \varphi , \tau )\) monotonically increases. The formal statement of this problem can be found in [26].

Online Causation Monitoring. Causation monitoring is proposed in [26] as a solution to the problem. Specifically, instead of monitoring robustness that indicates whether a partial trace violates the specification, it monitors whether the increment of the trace at each instant is the causation to the violation of the specification. Here, the definition of causation follows the online trace diagnostics [3, 27] that returns a (violation or satisfaction) epoch, which is a set of signal segments that sufficiently triggers the violation or satisfaction of the partial signal. Intuitively, an epoch can be considered as an explanation of a violation or satisfaction; the formal definition of epoch can be found in [3, 27]. Having the trace diagnostic result at each instant, causation monitoring aims to report such a verdict: if the current instant b of \(\textbf{v}_{0:b} \) is included in the violation epoch, it is considered as a violation causation; if b is included in the satisfaction epoch, it is considered as a satisfaction causation; otherwise, it is irrelevant.

The causation monitor proposed in [26] achieves this goal as follows: at each instant, it computes two quantities \(\mathcal {[\mathscr {R}]}^\ominus \left( \textbf{v}_{0:b}, \varphi , \tau \right) \) and \(\mathcal {[\mathscr {R}]}^\oplus \left( \textbf{v}_{0:b}, \varphi , \tau \right) \), that respectively indicate the distances of the current instant b from being a violation causation and a satisfaction causation. The formal definition of causation distances \(\mathcal {[\mathscr {R}]}^\ominus \left( \textbf{v}_{0:b}, \varphi , \tau \right) \) and \(\mathcal {[\mathscr {R}]}^\oplus \left( \textbf{v}_{0:b}, \varphi , \tau \right) \) are presented in Definition 4.

Table 1. The definitions of violation and satisfaction causation distances

Definition 4

(Online Causation Monitor [26]). Let \(\textbf{v}_{0:b} \) be a partial signal and \(\varphi \) be an STL formula. At an instant b, an online causation monitor returns a violation causation distance \(\mathcal {[\mathscr {R}]}^\ominus \left( \textbf{v}_{0:b}, \varphi , \tau \right) \) and a satisfaction causation distance \(\mathcal {[\mathscr {R}]}^\oplus \left( \textbf{v}_{0:b}, \varphi , \tau \right) \), as defined in Table 1.

The causation verdict, regarding whether b is a causation or not, can be inferred by the results of the online causation monitor in Definition 4, as follows:

  • if \(\mathcal {[\mathscr {R}]}^\ominus \left( \textbf{v}_{0:b}, \varphi , \tau \right) < 0\), then b is a violation causation;

  • if \(\mathcal {[\mathscr {R}]}^\oplus \left( \textbf{v}_{0:b}, \varphi , \tau \right) > 0\), then b is a satisfaction causation;

  • otherwise, i.e., \(\mathcal {[\mathscr {R}]}^\ominus \left( \textbf{v}_{0:b}, \varphi , \tau \right) > 0\) and \(\mathcal {[\mathscr {R}]}^\oplus \left( \textbf{v}_{0:b}, \varphi , \tau \right) < 0\), b is irrelevant.

Below, we use an example to illustrate how online causation monitor works.

Example 1

Consider the example in Fig. 1. As indicated by the robust monitor, the specification is violated by the signal after \(b = 10\). By online trace diagnostics (see [27]), at \(b = 19\), we can obtain a violation epoch \(\{\langle v< 10, t\rangle \mid t\in [5, 19]\}\), which implies that the violation so far is caused by the signal values v during [5, 19]. Since \(b = 19\) is included in this epoch, b is then considered as a violation causation. On the other hand, we can also compute the violation causation distance \(\mathcal {[\mathscr {R}]}^\ominus \left( \textbf{v}_{0:b}, \varphi , 0\right) = -1 < 0\) by Definition 4, which also indicates that b is a violation causation.

Similarly, at \(b = 23\), we obtain an epoch \(\{\langle v< 10, t\rangle \mid t\in [5, 20]\}\), in which \(b = 23\) is not included, so \(b = 23\) is irrelevant. This is also shown by computing the causation distances \(\mathcal {[\mathscr {R}]}^\ominus \left( \textbf{v}_{0:b}, \varphi , 0\right) = 2 > 0\) and \(\mathcal {[\mathscr {R}]}^\oplus \left( \textbf{v}_{0:b}, \varphi , 0\right) = \texttt{R}^{\alpha }_{\textrm{min}} < 0\).

Note that the result of causation monitoring is not monotonic, e.g., while \(b = 19\) is considered as a violation causation, \(b = 23\) is not. This feature is clearly shown by the visualized result of causation monitoring in Fig. 1.

Relationship with Robust Monitors. As indicated by Fig. 1 and Example 1, the causation monitor is not monotonic, and thus it delivers more information about system evolution. We refer to [26] for a more detailed explanation. Below, Lemma 1 states that the online causation monitor in Definition 4 refines the online robust monitor in Definition 3, in the sense that the monitoring results of robust monitors can be inferred from that of causation monitors. In other words, the information delivered by causation monitors is a superset of that can be delivered by classic robust monitors.

Lemma 1

The causation monitor in Definition 4 refines the classic online robust monitor in Definition 3, in the sense that the monitoring results of the robust monitor can be reconstructed from the results of the causation monitor, as follows:

$$\begin{aligned} &\mathrm {[R]}^\textsf{U}(\textbf{v}_{0:b}, \varphi , \tau ) = \inf _{t\in [0, b]}\mathcal {[\mathscr {R}]}^\ominus \left( \textbf{v}_{0:t}, \varphi , \tau \right) , \;\mathrm {[R]}^\textsf{L}(\textbf{v}_{0:b}, \varphi , \tau ) = \sup _{t\in [0, b]}\mathcal {[\mathscr {R}]}^\oplus \left( \textbf{v}_{0:t}, \varphi , \tau \right) \end{aligned}$$

4 Efficient Causation Monitoring

In [26], a straightforward way of synthesizing an online causation monitor has been provided that follows Definition 4. However, the synthesized monitor may not be sufficiently efficient to be deployed in practice, due to the high computational complexity when handling nested temporal operators.

Example 2

Consider the specification \(\varphi \equiv \Box _{[0,45]}(\Diamond _{[0,5]}v<10)\) in Fig. 1. For convenience, we name the sub-formulas of \(\varphi \) as follows: \( \varphi ' \equiv \Diamond _{[0,5]}(v< 10), \alpha \equiv (v< 10) \). Consider the computation of \(\mathcal {[\mathscr {R}]}^\ominus \left( \textbf{v}_{0:b}, \varphi , 0\right) \) that contains nested temporal operators. According to Definition 4, we need to compute as follows:

$$\begin{aligned} &\textstyle \mathcal {[\mathscr {R}]}^\ominus \left( \textbf{v}_{0:b}, \varphi , 0\right) = \inf _{t\in [0,45]}\mathcal {[\mathscr {R}]}^\ominus \left( \textbf{v}_{0:b}, \varphi ', t\right) \\ = &\textstyle \inf _{t\in [0,45]}\left( \inf _{t'\in [t, t+5]}\left( \max \left( \mathcal {[\mathscr {R}]}^\ominus \left( \textbf{v}_{0:b}, \alpha , t'\right) , \mathrm {[R]}^\textsf{U}(\textbf{v}_{0:b}, \alpha , t')\right) \right) \right) \end{aligned}$$

During this computation, for a fixed \(t'\), \(\mathcal {[\mathscr {R}]}^\ominus \left( \textbf{v}_{0:b}, \alpha , t'\right) \) and \(\mathrm {[R]}^\textsf{U}(\textbf{v}_{0:b}, \alpha , t')\) are repeatedly computed as long as it holds that \(t'\in [t, t+5]\) for any \(t\in [0,45]\). This results in numerous redundant computations, which can significantly diminish the efficiency of causation monitoring.

We introduce two dynamic programming strategies, i.e., intermediate result recording and sliding window, for accelerating causation monitoring.

4.1 Intermediate Result Recording

Our efficient causation monitoring algorithm is presented in Algorithm 1. The basic idea of the algorithm is to record the intermediate monitoring results, by maintaining several worklists for each of the sub-formulas of an STL formula \(\varphi \), so as to avoid redundant computations.

Sub-formulas and Evaluation periods. Given an STL formula \(\varphi \), the sub-formula set \(\textsf{SF}(\varphi )\) of \(\varphi \) is defined as follows (see an example in Example 2):

$$\begin{aligned} \begin{aligned} &\textsf{SF}(\alpha ) :=\{\alpha \} \qquad \textsf{SF}(\lnot \varphi ) :=\{\lnot \varphi \}\cup \textsf{SF}(\varphi ) \\ &\textsf{SF}(\varphi _1\wedge \varphi _2) :=\{\varphi _1\wedge \varphi _2\}\cup \textsf{SF}(\varphi _1)\cup \textsf{SF}(\varphi _2) \qquad \textsf{SF}(\Box _I\varphi ) :=\{\Box _I\varphi \}\cup \textsf{SF}(\varphi ) \end{aligned} \end{aligned}$$
Algorithm 1
figure c

Efficient online causation monitoring

At the beginning of Algorithm 1, for each sub-formula \(\psi \in \textsf{SF}(\varphi )\), we initialize four worklists, including \(\textsf{Cau}^{\ominus }[{\psi }]\) and \(\textsf{Cau}^{\oplus }[{\psi }]\) that record the violation and satisfaction causation distances, \(\textsf{Rob}^{\textsf{U}}[{\psi }]\) and \(\textsf{Rob}^{\textsf{L}}[{\psi }]\) that record the upper and lower robustness bounds.

Each of these four lists is defined over the evaluation period \( Eva (\psi )\) of each \(\psi \), that is, intuitively, the time interval that includes all the instants t such that \(\mathcal {[\mathscr {R}]}^\ominus \left( \textbf{v}_{0:b}, \psi , t\right) \) is needed for the computation of \(\mathcal {[\mathscr {R}]}^\ominus \left( \textbf{v}_{0:b}, \varphi , 0\right) \) (see Definition 4, and it also holds for satisfaction distances). Formally, given \(\psi \in \textsf{SF}(\varphi )\), the evaluation period is computed as follows. First, \( Eva [\varphi , 0]\), a set that includes the evaluation periods of all the sub-formulas \(\psi '\in \textsf{SF}(\varphi )\), is computed recursively as follows:

$$\begin{aligned} & Eva [\alpha , t] :=\{\langle \alpha ,t \rangle \} \qquad Eva [\lnot \varphi , t] :=\{\langle \lnot \varphi , t\rangle \}\cup Eva [\varphi , t]\\ & Eva [\varphi _1\wedge \varphi _2, t] :=\{\langle \varphi _1\wedge \varphi _2, t\rangle \}\cup Eva [\varphi _1, t]\cup Eva [\varphi _2, t]\\ & Eva [\Box _I\varphi , t] :=\{\langle \Box _I\varphi , t\rangle \}\cup \textstyle \bigcup _{t'\in t + I} Eva [\varphi , t'] \end{aligned}$$

Then, the evaluation period \( Eva [\varphi , 0](\psi )\) (we denote as \( Eva (\psi )\) for simplicity) of \(\psi \) is defined as \( Eva (\psi ) = \{t\mid \langle \psi , t\rangle \in Eva [\varphi , 0]\}\).

Example 3

Consider the sub-formulas \(\varphi \), \(\varphi '\) and \(\alpha \) of the formula \(\varphi \) in Example 2. The evaluation periods of these sub-formulas can be computed as follows.

  • First, \(\textstyle Eva [\varphi , 0] = \{\langle \varphi , 0\rangle \} \cup \bigcup _{t'\in [0,45]}\{\langle \varphi ', t'\rangle \} \cup \bigcup _{t''\in [0,50]}\{\langle \varphi ', t''\rangle \} \);

  • Then, we can obtain the evaluation periods for \(\varphi \), \(\varphi '\) and \(\alpha \), respectively as follows: \( Eva (\varphi ) = 0, Eva (\varphi ') = [0,45], Eva (\alpha ) = [0,50]. \)

Monitoring Algorithm. During the growth of partial signal \(\textbf{v}_{0:b} \), Algorithm 1 monitors \(\textbf{v}_{0:b} \) by calling the function UpdateCau at each instant, which updates the worklists \(\textsf{Cau}^{\ominus }[{\psi }]\) and \(\textsf{Cau}^{\oplus }[{\psi }]\), such that \(\textsf{Cau}^{\ominus }[{\psi }](t)\) equals to \(\mathcal {[\mathscr {R}]}^\ominus \left( \textbf{v}_{0:b}, \psi , t\right) \) and \(\textsf{Cau}^{\oplus }[{\psi }](t)\) equals to \(\mathcal {[\mathscr {R}]}^\oplus \left( \textbf{v}_{0:b}, \psi , t\right) \), as defined in Definition 4. Unlike the plain monitoring algorithm in Example 2, when updating \(\textsf{Cau}^{\ominus }[{\psi }]\) and \(\textsf{Cau}^{\oplus }[{\psi }]\), Algorithm 1 relies on the worklists of the sub-formulas of \(\psi \) that are already available rather than computing the causation distances of the sub-formulas from scratch, thereby saving monitoring time significantly. We illustrate this process in Example 4.

As shown in Algorithm 1, UpdateCau is defined recursively based on the structure of an STL formula. In Algorithm 1, we only show a part of the operators; other operators can be derived by the STL syntax (Definition 1) and the presented operators.

  • The updates for \(\alpha \) and \(\lnot \varphi \) exactly follow Definition 4;

  • The update for \(\varphi _1\wedge \varphi _2\) requires not only the worklists of causation distances of sub-formulas, but also the worklists of robustness bounds of sub-formulas (according to Definition 4), so it calls the auxiliary function \(\textsc {UpdateRob}\) (see Algorithm 2 in [6]) to update the worklists \(\textsf{Rob}^{\textsf{L}}[{\varphi _1}]\) and \(\textsf{Rob}^{\textsf{L}}[{\varphi _2}]\) of robustness bounds of sub-formulas. The function \(\textsc {UpdateRob}\) was originally introduced in [6]. It updates the worklists of robustness bounds in a similar way to what UpdateCau does for causation distances, i.e., when updating the worklists of robustness bounds for a formula \(\psi \), it also relies on the worklists of robustness bounds for its sub-formulas, rather than computing from scratch.

  • The update for \(\Box _I\varphi \) requires to compute the minimum of \(\textsf{Cau}^{\ominus }[{\varphi }]\) over the time window \(t+I\), for each \(t\in Eva (\Box _I\varphi )\). To efficiently update the list, we adapt a sliding window algorithm [18] (elaborated on in §4.2) that, given a list \(\{a_1, \ldots , a_N\}\) and an index window [lu] (\(l, u\in \mathbb {N}\)), computes the minimum \(\min _{j\in [i+l, i+u]}{a_j}\) over the window \([i+l, i+u]\) for each \(i\in \{1, \ldots , N-u\}\). In Algorithm 1, \(\textsf{Trans} (I)\) transforms a time interval I to the index representation of a window, simply by considering the sampling frequencyFootnote 2.

Algorithm 2
figure d

Sliding window algorithm

4.2 Sliding Window Algorithm

The sliding window algorithm [18] is given in Algorithm 2, which computes the local minimum \(\min _{j\in [i+l, i+u]}a_j\) over the span \([i+l, i+u]\), for each \(i\in \{1, \ldots , N-u\}\). This is yet another dynamic programming strategy, by using a double-ended queue Q (Line 2) to record the comparisons that have been performed between the elements in Q, and thus eliminate redundant comparisons.

Algorithm 2 takes as input a list \(\{a_1, \ldots , a_N\}\) and a window [lu]. Initially, the window is placed to contain the elements \(a_{1+l}, \ldots , a_{1+u}\), and the index \(l+1\) is pushed to the back of Q (Line 4). Then it enters a loop to traverse the list (Line 5). Inside the loop, first, it checks the condition whether a result should be reported, i.e., the elements in the initial window have been traversed (Line 6). From which that on, it reports the list element with the index at the front of Q at each loop (Line 7). Then, it recursively removes the indexes \(a_{\textsc {Back(Q)}}\) at the back of Q, if the element \(a_i\) of the current index i is greater than the \(a_{\textsc {Back(Q)}}\) (Line 8-11), and pushes the current index i to the back of Q (Line 12). If the index at the front of Q is already out of the scope of the window, then it is also removed (Line 14).

Note that, the index of local minimum over the window is always stored at the front of Q, and that is why it is returned at each loop in Line 7 of Algorithm 2. In this way, the comparisons that have been performed between list elements are reflected by the state of Q, thus reducing redundancies significantly.

Example 4

Consider the specification in Example 2. According to Algorithm 1, our computation of \(\mathcal {[\mathscr {R}]}^\ominus \left( \textbf{v}_{0:b}, \varphi , 0\right) \) at \(b = 19\) relies on updating the worklists for the sub-formulas \(\textsf{SF}(\varphi )\) of \(\varphi \). As shown in Table 2, we need to maintain five worklists for the sub-formulas of \(\varphi \). Each worklist is defined over the evaluation period of the sub-formula, as computed in Example 3.

Due to recursive call of UpdateCau and UpdateRob, our algorithm first updates the worklist \(\textsf{Cau}^{\ominus }[{\alpha }]\) and \(\textsf{Rob}^{\textsf{U}}[{\alpha }]\) for \(\alpha \), as shown by the corresponding rows in Table 2. Then, \(\textsf{Rob}^{\textsf{U}}[{\varphi '}]\) is updated by taking the local maximum over each window [0, 5] (by UpdateRob), and \(\textsf{Cau}^{\ominus }[{\varphi '}]\) is updated based on \(\textsf{Cau}^{\ominus }[{\alpha }]\) and \(\textsf{Rob}^{\textsf{U}}[{\varphi '}]\) (by Algorithm 1). Finally, \(\textsf{Cau}^{\ominus }[{\varphi }]\) can be updated based on \(\textsf{Cau}^{\ominus }[{\varphi '}]\).

Table 2. The worklists for computing \(\mathcal {[\mathscr {R}]}^\ominus \left( \textbf{v}_{0:b}, \varphi , 0\right) \) at \(b = 19\)

Compare our algorithm with the naive one in Example 2. In our algorithm, the computations of \(\mathcal {[\mathscr {R}]}^\ominus \left( \textbf{v}_{0:b}, \alpha , t\right) \) and \(\mathrm {[R]}^\textsf{U}(\textbf{v}_{0:b}, \alpha , t)\) for any specific t both happen only once; then, they are recorded in the worklists and when they are used to update the worklists of other formulas, they can be directly read from the worklist, which does not take new computation costs. Therefore, our algorithm avoids the repeated computation of \(\mathcal {[\mathscr {R}]}^\ominus \left( \textbf{v}_{0:b}, \alpha , t\right) \) and \(\mathrm {[R]}^\textsf{U}(\textbf{v}_{0:b}, \alpha , t)\), as shown in the monitoring process in Example 2.

5 Demonstration of CauMon

We implemented CauMon in C++, which can be easily compiled to interface with CPS implemented in any formalism. In this section, we showcase the usage of CauMon, by compiling it to be a MATLAB API, based on the MEX functions of MATLAB. A code snippet for monitoring \(\varphi \) in Fig. 1 is shown as follows.

figure e

The function cau_mon serves as the interface to call our causation monitor, which requires four arguments, namely, a trace, a list of signal names, an STL specification and a non-negative \(\tau \). Their formats are required as follows:

  • trace is a high-dimensional array. Its first row is an array of time stamps; from second row on, each row denotes a signal concerned with the STL;

  • signal is a string that denotes a list of signals that correspond to each row of trace. Multiple signals can be separated by commas.

  • spec is a string that denotes an STL formula. The syntax of spec follows the standard in Breach [8].

  • tau is a non-negative real, as \(\tau \) defined in Definition 4.

During the monitoring of system execution, the program iteratively calls cau_mon with an updated trace, which is obtained by a function get_trace. The function get_trace requires an interface with the system being monitored. Typically, such an interface is provided by a CPS simulator; for example, in the case of Simulink models, one can use the function sim to obtain the output of Simlink models. At each instant, cau_mon can return a violation causation distance vio_d and a satisfaction causation distance sat_d, exactly as defined in Definition 4.

6 Experimental Evaluation

We introduce the experimental evaluation of our tool \(\textsf {CauMon} \). For the purpose of comparison, we also integrated two baseline monitors, namely, RobM (the online robust monitor from [6]) and PCauM (the plain implementation of online causation monitor from Definition 4) into our tool, as an option that can be specified by users. Our tool is publicly available in our Github repositoryFootnote 3.

6.1 Experiment Setting

Benchmarks. To evaluate the efficiency of our proposed monitoring algorithm, we collect traces from four MATLAB Simulink models that are commonly-used in the CPS community [10,11,12, 21].

Automatic Transmission (AT) implements a transmission controller of an automotive system. It has been widely-used recently [10,11,12] as a benchmark of CPS testing and monitoring. It contains 64 blocks in total, including a stateflow chart that represents the transmission control logic. The outputs of AT, including \(\texttt{speed}\) and \(\texttt{RPM}\), reflect the state of the automotive system. The specifications that AT are expected to hold are listed as follows:

  • \(\varphi ^{\textsf{AT}}_{1} \equiv \Box _{[0,30]}(\texttt{speed} < 110)\): \(\texttt{speed}\) should be always low;

  • \(\varphi ^{\textsf{AT}}_{2} \equiv \Box _{[0,29]}(\texttt{speed} > 70 \rightarrow \Diamond _{[0,1]}(\texttt{speed} > 80))\): there should be a drastic \(\texttt{speed}\) change from 70 to 80;

  • \(\varphi ^{\textsf{AT}}_{3} \equiv \Box _{[0,27]}(\texttt{speed} > 50 \rightarrow \Diamond _{[1,3]}(\texttt{RPM} < 3000))\): whenever \(\texttt{speed}\) is higher than 50, \(\texttt{RPM}\) should be below 3000 in 3 s;

  • \(\varphi ^{\textsf{AT}}_{4} \equiv \Box _{[0,29]} (\texttt{speed} < 100) \vee \Box _{[29,30]} (\texttt{speed} > 65)\): there should not be a drastic speed change at the end of the simulation;

Abstract Fuel Control (AFC) is a powertrain control system released by Toyota [17] and has been widely used as a benchmark in CPS community [10,11,12, 21]. The system takes external inputs including engine speed and pedal angle, and adjusts the air-to-fuel ratio to ensure the performance of the powertrain system. The output of AFC includes the air-to-fuel ratio \(\texttt{AF}\) and a reference value \(\texttt{AFref}\). The specifications of AFC are listed as follows:

  • \(\varphi ^{\textsf{AFC}}_{1} \equiv \Box _{[10, 50]}(\left| \texttt{AF}-\texttt{AFref} \right| < 0.1 )\): the deviation of \(\texttt{AF}\) from \(\texttt{AFref}\) should always be small;

  • \(\varphi ^{\textsf{AFC}}_{2} \equiv \Box _{[10, 48.5]}\Diamond _{[0,1.5]}\left( \left| \texttt{AF}-\texttt{AFref} \right| < 0.08\right) \): a large deviation should not last for too long;

Fig. 2.
figure 2

Comparison between CauMon and RobM, in terms of the information provided by different monitors. In each of the sub-figures, the top plot is the signals, the middle plot is the result of online robust monitors [6], and the bottom plot is the result of online causation monitors.

Neural Network Controller (NN). This is a magnetic levitation system that has been used as a benchmark in [10, 12, 21, 29]. It takes one input signal, \(Ref\in [1, 3]\), which is the reference for the position Pos of a magnet suspended above an electromagnet. The specification of NN is a complicated one:

  • \(\varphi ^{\textsf{NN}}_{1} \equiv \Box _{[0,18]}(\lnot \texttt{close} \rightarrow \texttt{reach})\), where \(\texttt{close} \equiv |Pos - Ref| \le \rho + a\cdot |Ref|\), and \(\texttt{reach} \equiv \Diamond _{[0,2]}(\Box _{[0,1]}(\texttt{close}))\): the position should approach the reference position in some seconds when they are far. Here, \(a = 0.04\) and \(\rho = 0.004\).

Free Floating Robot (FFR) models a robot moving in a 2D space [28, 30]. It takes as input the four boosters of the robot, and outputs four signals that are the position in terms of coordinate values xy. The specification of FFR is as follows:

  • \(\varphi ^{\textsf{FFR}}_{1} \equiv \lnot (\Diamond _{[0,5]}(\Box _{[0,2]}(x \in [1.5, 1.7] \wedge y \in [1.5, 1.7])))\): it requires the robot not to stay in an area for at least 2 s.

Experiment Design. For each specification, we first generate 10 signals by running the Simulink models with random inputs, and then for each signal, we apply the three monitors and compare the total time they spent in monitoring the signal. To handle the fluctuation of monitoring time due to the environmental noises, we repeat each monitoring process for five times and report the average monitoring time as the result. While some monitors may be not efficient and not terminated within a reasonable time budget, we set 5000 s as a timeout.

Our experiments are conducted on Amazon EC2 c4.2xlarge instances (2.9 GHz Intel Xeon E5-2666 v3, 15 GB RAM).

6.2 Evaluation

Efficacy of CauMon . To demonstrate the supriority of CauMon compared to RobM in terms of informativeness, we show two plots in Fig. 2, which depict the signals produced by Simulink models and the monitoring results of CauMon and RobM. Due to the page limit, examples of other specifications are presented in our Github repository. We can observe that, while the upper robustness curves in these three plots provided by RobM are always monotonic, the violation causation distances provided by CauMon are not monotonic, thus they can deliver more information about system evolution. For instance, in Fig. 2b, the spikes shown in the monitoring result of CauMon reflect that the deviation between \(\texttt{AF}\) and \(\texttt{AFref}\) is greater than the threshold 0.1 in \(\varphi ^{\textsf{AFC}}_{1}\) for more than once. However, this information can not be provided by the monotonic curve of the upper robustness from the monitoring result given by RobM.

Efficiency of CauMon . The experimental results are presented in Table 3. Each sub-table reports the results of monitoring 10 signals for the corresponding specification. The first three columns of each sub-table report the total monitoring time of the three monitors, and the last two columns report the comparison between the proposed CauMon and the two baseline monitors RobM and CauMon, computed by \(\varDelta A = \nicefrac {(\textsf {CauMon}- A)}{A}\), where A is either \(\textsf {PCauM} \) or \(\textsf {RobM} \).

First, by the comparison between \(\textsf {PCauM} \) and \(\textsf {CauMon} \), we observe that in our experiments, CauMon always outperforms PCauM, with an improvement of at least \(17.5\%\) (in \(\varphi ^{\textsf{AT}}_{4}\)). In particular, in those complex specifications that have nested temporal operators, such as \(\varphi ^{\textsf{AT}}_{2}\), \(\varphi ^{\textsf{AT}}_{3}\) and \(\varphi ^{\textsf{NN}}_{1}\), PCauM can take extremely long time to monitor the traces: for \(\varphi ^{\textsf{AT}}_{2}\) and \(\varphi ^{\textsf{AT}}_{3}\), it takes around 1900 s; for \(\varphi ^{\textsf{NN}}_{1}\) that has nested operators of three levels, it even gets timeout. In the case of \(\varphi ^{\textsf{AT}}_{2}\) and \(\varphi ^{\textsf{AT}}_{3}\), each instant of the signals takes about \(\frac{1900}{30\times 100}=0.63\) seconds, which is 63 times longer than the sampling period (0.01 s for AT) of the traces. This performance can lead to severe delays if PCauM is deployed in practice, and the main reason is, as shown in Example 2, because of the redundant computations of intermediate results. In contrast, CauMon does not suffer from the problem and provides a monitoring performance that allows its usage in a real-world setting, possibly even a synchronous monitoring setting.

Table 3. Experimental results of the three monitors RobM, PCauM and CauMon Time is reported in seconds.

The performance of PCauM is severely subject to the complexity of the specification. In the problems that have simpler formulas (e.g., \(\varphi ^{\textsf{AT}}_{1}\), \(\varphi ^{\textsf{AFC}}_{1}\)), PCauM is not very slow. However, for specifications that have nested operators (e.g., \(\varphi ^{\textsf{AT}}_{2}\), \(\varphi ^{\textsf{AT}}_{3}\), \(\varphi ^{\textsf{NN}}_{1}\)), PCauM becomes not feasible in practice. By contrast, CauMon suffers much less from this issue. Even for complex specifications, CauMon is still very efficient and its performance is always comparable with RobM.

By the comparison between RobM and CauMon, we observe that, the performance of CauMon is at the same magnitude of RobM, and so its performance is comparable with RobM. While in some cases CauMon is not as fast as RobM, the performance difference is not very large. This is acceptable, regarding that our CauMon can provide more information about system evolution than RobM. There also happens that CauMon is faster than RobM, with simple specifications that have no nested temporal operators, such as \(\varphi ^{\textsf{AT}}_{1}\) and \(\varphi ^{\textsf{AFC}}_{1}\). This is because monitoring simple specifications, like \(\Box _I\alpha \), mainly needs the causation distance lists \(\textsf{Cau}^{\ominus }[{\alpha }]\) and \(\textsf{Cau}^{\oplus }[{\alpha }]\) of atomic proposition \(\alpha \), and by Defs. 3 and 4, \(\textsf{Cau}^{\ominus }[{\alpha }]\) and \(\textsf{Cau}^{\oplus }[{\alpha }]\) have simpler shape than \(\textsf{Rob}^{\textsf{U}}[{\alpha }]\) and \(\textsf{Rob}^{\textsf{L}}[{\alpha }]\).

7 Conclusion and Future Work

We propose an efficient approach for online causation monitoring. Our approach features two dynamic programming strategies, namely, the use of causation distance lists that record intermediate results, and the use of sliding window algorithms that accelerate the causation computation of temporal operators. Experiments show that, in terms of efficiency, our approach significantly outperforms the plain causation monitor in [26], and is comparable with the existing online robust monitors that deliver less information about system evolution than ours.

As future work, we would like to explore the application of the proposed monitors for system behavior analysis. For instance, by causation monitoring, we can obtain the information about when a cause of the specification violation happens, and this can be used for fault analysis such as localization and repair.