1 Introduction

Cyber-physical systems are complex environments that combine physical devices (i.e., sensors and actuators) with a software controller. The ubiquity of these systems and dangers associated with their failure require the implementation of mechanisms to monitor, verify and guarantee their correct behaviour.

This paper presents ParetoLib 2.0, a Python tool for offline monitoring of cyber-physical systems. ParetoLib 2.0 allows qualitative and quantitative analysis of real-time signals using specification in an extension of Signal Temporal Logic (STL) [1]. Additionally, ParetoLib 2.0 supports mining parameter valuations of parametric STL (PSTL) [2] properties from trace examples (i.e., a variant of STL that allows parameters in addition to numeric constants).

ParetoLib 2.0 gathers the contributions presented in [3,4,5] and extends previous versions of ParetoLib [6] with the following additions. First, we provide a graphical user interface that helps the end users efficiently interact with the tools and interpret the results. Second, we support additional quantitative STL operators allowing to express a richer set of properties involving counting of events (\(\epsilon\)-count), trends (derivatives), or accumulations (integrals). Third, ParetoLib includes recent algorithms for mining parametric STL specifications that do not require monotonicity assumptions of the validity domain [7]. Finally, we have optimised the tool performance by adding multicore support, type annotations and directives to compile the Python modules into C code.

The paper is organised as follows: Sect. 2 motivates the extension of STL with some examples that are complicated to express without the new STL operators. Section 3 introduces the STL specification language and the quantitative extensions. Next, Sect. 4 presents the technical aspects of ParetoLib 2.0 and details the new contributions. Section 5 presents a case study that uses the new quantitative operators. Section 6 compares ParetoLib 2.0 to similar tools. Finally, Sect. 7 gathers the conclusions and sketches the future work.

2 Motivating examples

Before formally defining the STL language, let us look at some examples of properties that we would like to express. In particular, we look at properties that motivated the development of more expressive and harder to monitor logics.

Example 1

(Stabilisation) The first interesting property is stabilisation around a value that is not known in advance, e.g., “x stays within \(+/-0.05\) units of some value for at least 200 time units”. It is tempting to formalise this property using existential quantification “there exists a threshold v, such that...”, which is possible with first-order logic of signals (and was one of its motivational properties [8]), but it is actually not necessary. Instead, we can compute the minimum and maximum of x over the next 200 time units and compare their distance to \(0.1 = 2 \cdot 0.05\). In some imaginary language, we could write \(max_{[0,200]} x - min_{[0,200]} x \le 0.1\). At this point we propose to separate the aggregate operators from the operator that defines the temporal window, which will be useful later, when the until operator will define a window of variable width. We use the operator \(On_{[a,b]}\) to define the temporal window of constant width and the operators Min and Max (capitalised) to denote the minimum and maximum over the previously defined window.

Signal x stabilises within 0.05 units of an unknown value for 200 time units:

$$\begin{aligned} On_{[0,200]} Max \, x - On_{[0,200]} Min \, x \le 0.1 \end{aligned}$$

Figure 1 shows an example of a signal x(t) (red) performing damped oscillation with the period of 250 time units. Blue and green curves are the maximum and the minimum of x over a siding window \([t,t+200]\). Finally, the orange Boolean signal (its y scale is on the right) evaluates to true (i.e., \(y = 1\)) when the maximum and minimum of x over the next 200 time units are within 0.1.

Fig. 1
figure 1

Damped oscillation x(t) and its maximum and minimum over the window \([t,t+200]\)

Fig. 2
figure 2

Sine wave x(t), its maximum over the window \([t,t+200]\), and whether x(t) is a local maximum on the interval \([t,t+200]\)

Example 2

(Local Maximum) Consider the property: “the current value of x is a minimum or maximum in some neighbourhood of current time point”. Previously, a similar property became a motivation to extend STL with freeze quantifiers [9], but we can also express it by comparing the value of a signal with some aggregate information about its neighbourhood, which we can do similarly to the previous example.

Current value of x is a local maximum on the interval [0, 85] relative to the current time.

$$\begin{aligned} x \ge On_{[0,85]} Max \, x \end{aligned}$$

Figure 2 shows an example of a sine wave x(t) (red) with the period of 250 time units. Blue curve is the maximum x over a siding window \([t,t+85]\). The orange Boolean signal evaluates to true when the current value of x is a maximum for the next 85 time units. Another way of detecting the moments when the signal reaches a local minimum or maximum, regardless the actual value of x, consists on expressing the property in terms of derivatives (D operator). Local maximum and minimum are reached when derivatives are equal to zero, so the following expression will draw a pulse in the surroundings of local peaks in the orange Boolean signal every 125 time units (not shown in Fig. 2).

$$\begin{aligned} On_{[0,85]} \vert D \, x \vert \le 0.1 \end{aligned}$$

Example 3

(Stabilisation Contd.) We want to be able to assert that x becomes stable around some value not for a fixed duration, but until some signal q becomes true. We will be able to do this with our version of until operator.

Signal x is stable within 0.05 units of an unknown value until q becomes true:

$$\begin{aligned} (Max \, x {\mathcal {U}} q) - (Min \, x {\mathcal {U}} q) \le 0.1 \end{aligned}$$

Intuitively, for a given time point, we want the monitor to find the closest future time point, where q holds and compute Min and Max of x over the resulting interval. Note that this property cannot be easily monitored in the framework of “STL with pre-processing”, since it requires the monitor to compute Min and Max over a sliding window of variable width, which depends on the satisfaction signal of q.

Example 4

(Linear Increase) At this point, we can assert x to follow a more complex shape, for example, to increase or decrease with a given slope. Let T denote an auxiliary signal that linearly increases with rate 1 (like a clock of a timed automaton), i.e. we define \(T(t)=t\); this example works as well for \(T(t) = t + c\), where c is a constant. To specify that x increases with the rate 2.5, we assert that the distance from x to \(2.5 \cdot \textrm{T}\) stays within some bounds.

Signal x increases approximately with slope 2.5 during the next 100 time units:

$$\begin{aligned} On_{[0,100]} Max \, \vert x - 2.5 T \vert - On_{[0,100]} Min \, \vert x - 2.5 T \vert \le 0.1 \end{aligned}$$

The previous expression can be rewritten in terms of derivatives in a much more simpler way:

$$\begin{aligned} On_{[0,100]} \vert D \, x \vert \le 2.5 \end{aligned}$$

Example 5

(Integral) Let us consider the following formula:

$$\begin{aligned} \varphi _{f}:= On_{[0,p_1]} I\, {\textbf{x}}_0 \ge c_2 - p_{2} \end{aligned}$$
(1)
Fig. 3
figure 3

Piece-wise constant signal \(w_{10}(t)\)

Fig. 4
figure 4

Validity domain for signal \(w_{10}(t)\) and the formula with integral

Fig. 5
figure 5

Approximation of the validity domain for signal \(w_{10}(t)\) and the formula with integral

Consider the piece-wise constant signal \(w_{10}\) (Fig. 3) with sample values 0, 9 and 18 at times 0, 1 and 2 respectively and it ends at time 10. Let us now analytically compute the solution for the parametric identification problem for the above formula and the signal \(w_{10}\). The value of the parameter \(p_1\) varies between 0 and 10 and the parameter \(p_2\) lies between 0 and 250 and \(c_2\) equals 250. We can analytically compute the validity domain by considering 10 cases corresponding to the 10 segments of \(w_{10}\). We give below the first three parts of the disjunction of constraint systems for representing the validity domain corresponding to the three cases which are \(p_1\in [0,1)\), \(p_1\in [1,2)\) and, \(p_1\in [2,3)\):

$$\begin{aligned}{} & {} \begin{aligned} region_1&:= \{(p_1, p_2) \in {\mathbb {R}}^2 \, \vert \, (0\le p_1<1) \wedge (0*p_1\ge 250-p_2))\}\\ region_2&:= \{(p_1, p_2) \in {\mathbb {R}}^2 \, \vert \, (1\le p_1<2) \wedge (0*1+9*(p_1-1)\ge 250-p_2))\}\\ region_3&:= \{(p_1, p_2) \in {\mathbb {R}}^2 \, \vert \, (2\le p_1<3) \wedge (0*1+9*1+18*(p_1-1-1)\ge 250-p_2)\}\\ \ldots \end{aligned}\nonumber \\{} & {} v_f:= \bigcup _{i=1}^{10} region_i \end{aligned}$$
(2)

The full validity domain, named \(v_f\), is computed as the union of all the regions and shown in green colour in Fig. 4. The validity domain for \((\lnot (\lnot \varphi _f))\) will be \((\lnot (\lnot v_f))\). If we naively compute this series of negations over semi-linear domains then the time complexity for obtaining the exact solution might be high. We implemented in C++ the negation operation on semi-linear domains using Parma Polyhedra Library [10]. We tested in addition to \(w_{10}\) using piece-wise constant signals \(w_{100}\) and \(w_{1000}\) each having 100 and 1000 segments respectively. We computed the validity domains by treating them as semi-linear domains for \(w_{10}\), \(w_{100}\) and \(w_{1000}\) for \((\lnot (\lnot \varphi _f))\) while suitably setting the value of constant \(c_2\) the ranges of values of parameters \(p_1\) and \(p_2\). We observed computation times between 30 and 50 milliseconds for signal \(w_{10}\) and 57264 milliseconds for signal \(w_{100}\) in a conventional laptop. For \(w_{1000}\) the operations timed out after the one hour limit we set. We see that the computation time increases steeply with the size of the signal and becomes prohibitively high for signal \(w_{1000}\) which is of reasonable size.

On the other hand, if we reason in terms of membership queries to an oracle that guides the computation of the validity domains, then the previous \((\lnot (\lnot v_f))\) expression for signal \(w_{10}\) could be approximated efficiently (see Fig. 5) in 145 milliseconds. Our approach requires 1997 and 75932 milliseconds respectively for approximating more than the \(99\%\) of the validity domains of signals \(w_{100}\) and \(w_{1000}\). The C++ and ParetoLib code, as well as the \(w_{10}\), \(w_{100}\) and \(w_{1000}\) signals in CSV format are publicly available.Footnote 1

More examples of properties that require quantitative STL operators can be found in the literature. For instance, the monitoring of the accumulated electricity consumption on smart meters is easily described with integrals over a time window [11]. The rest of the paper includes toy examples and a case study that amplifies the necessity of the new features we present. Further sections illustrate the methods we implement for approximating the validity domain we obtained in the last example, and outperform other monitoring tools that compute exact solution.

3 Extended signal temporal logic

3.1 Boolean semantics

Signal Temporal Logic (STL) [1] focuses on specifying properties of continuous-time signals that represent the evolution of attributes of a cyber-physical system. Formally, a signal is a function \({\textbf{x}}: {\mathbb {T}} \rightarrow {\mathbb {R}}^n\), where the time domain \({\mathbb {T}} = [T_{\textrm{start}}, T_{\textrm{end}}]\) is a closed real interval, and the value \(\vert {\textbf{x}} \vert = T_{\textrm{end}} - T_{\textrm{start}}\) is the duration of the signal. We refer to signal components using their own letters: \(x, \, y, \, z \, \ldots \in {\mathbb {T}} \rightarrow {\mathbb {R}}\).

Then, the grammar of STL can be defined as follows:

$$\begin{aligned} \varphi \,= P(x_1, \dots , x_n) \ \mid \ \lnot \varphi \ \mid \ \varphi _{1} \vee \varphi _{2} \ \mid \ \varphi _{1} {\mathcal {U}}_{[t_{1}, t_{2}]} \varphi _{2} \end{aligned}$$
  • \(P \in {\mathbb {T}} \rightarrow \{\textrm{true}, \textrm{false}\}\) is a predicate over the components of the input signal;

  • \({\mathcal {U}}\) is the temporal operator until; \(\varphi _{1} {\mathcal {U}}_{[t_{1}, t_{2}]} \varphi _{2}\) asserts that \(\varphi _2\) becomes true in the future, with the delay between \(t_1\) and \(t_2\), and \(\varphi _1\) must hold until that point.

The semantics of an STL formula \(\varphi\) with respect to a signal \({\textbf{x}}\) is its satisfaction signal \(\llbracket \phi \rrbracket \in {\mathbb {T}} \rightarrow \{\textrm{true}, \textrm{false}\}\); for every time point t, \(\llbracket \phi \rrbracket (t)\) tells whether or not \(\phi\) holds at that time point. That is, \(({\textbf{x}}, t) \models \varphi \iff \llbracket \varphi \rrbracket (t)\).

$$\begin{aligned} \begin{array}{lcl} \llbracket P(x_1, \dots , x_n) \rrbracket (t) &{}{} = {}&{} P(x_1(t), \dots , x_n(t)) \\ \llbracket \lnot \varphi \rrbracket (t) &{}{} = {}&{} \lnot \llbracket \varphi \rrbracket (t) \\ \llbracket \varphi _1 \vee \varphi _2 \rrbracket (t) &{}{} = {}&{} \llbracket \varphi _1 \rrbracket (t) \vee \llbracket \varphi _2(t) \rrbracket \\ \llbracket \varphi _1 {\mathcal {U}}_{[t_1,t_2]} \varphi _{2} \rrbracket (t) &{}{} = {}&{} \exists t' \in [t + t_1, t + t_2]. \llbracket \varphi _2 \rrbracket (t') \wedge \forall t'' \in [t, t']. \llbracket \varphi _1 \rrbracket (t'') \\ \end{array} \end{aligned}$$

Here, we use the non-strict semantics of until that requires the existence of a time point where both \(\varphi _1\) and \(\varphi _2\) hold.

Somewhat more commonly used temporal operators F (eventually) and G (always) can be defined in terms of until:

$$\begin{aligned} F_{[t_1,t_2]} \varphi \equiv \top {\mathcal {U}}_{[t_1,t_2]} \varphi{} & {}\;G_{[t_1,t_2]} \varphi \equiv \lnot F_{[t_1,t_2]} \lnot \varphi \end{aligned}$$

\(F_{[t_1,t_2]} \varphi\) asserts that \(\varphi\) becomes true with a delay between \(t_1\) and \(t_2\); and \(G_{[t_1, t_2]} \varphi\) asserts that for every time point between \(t_1\) and \(t_2\) time points in the future, \(\varphi\) holds.

$$\begin{aligned} \begin{array}{lcl} \llbracket F_{[t_1, t_2]} \varphi \rrbracket (t) &{}{} = {}&{} \exists t' \in [t + t_1, t + t_2]. \llbracket \varphi \rrbracket (t') \\ \llbracket G_{[t_1, t_2]} \varphi \rrbracket (t) &{}{} = {}&{} \forall t' \in [t + t_1, t + t_2]. \llbracket \varphi \rrbracket (t') \end{array} \end{aligned}$$

Note that the atomic propositions of STL are predicate signals; they are assumed to be produced from sensor readings of some cyber-physical system, but the logic does not specify how exactly this happens. The satisfaction signals are piecewise-constant (since the range is \(\{\textrm{true}, \textrm{false}\}\)), and this is the only kind of signals STL monitors need to work with.

3.2 Quantitative semantics: new additions to STL

STL also admits robustness semantic that measures how far is an observed behaviour from satisfying/violating a specification. In this case, the semantics of a formula \(\varphi\) is interpreted as a piecewise-constant or piecewise-linear function from real-time, i.e., \(\varphi\) formula is a function \(\llbracket \varphi \rrbracket : {\mathbb {T}} \rightarrow {\mathbb {R}}\). Thus, \(\varphi\) has real-valued switching points. This feature facilitates the inclusion of quantitative operators while treating real numbers 0 and 1 as Booleans (i.e., \(0 \equiv \textsf {false}\), \(1 \equiv \textsf {true}\)). The quantitative interpretation of a \(\varphi\) formula involves that Boolean expressions are rewritten in terms of the composition of min/max operators:

$$\begin{aligned} \varphi _{1} \wedge \varphi _{2} \equiv \texttt {min} \{\varphi _{1}, \varphi _{2}\}{} & {} \varphi _{1} \vee \varphi _{2} \equiv \texttt {max} \{\varphi _{1}, \varphi _{2}\}{} & {} \lnot \varphi \equiv {1 - \varphi } \end{aligned}$$

This quantitative interpretation should not be confused with the quantitative semantics for STL called space robustness (known popularly as simply robustness) defined in [12]. This robustness denoted as \(\rho\) indicates by how much margin a given signal w satisfies an STL formula \(\varphi\). It follows that positive and negative robustness values correspond to satisfaction and violation of the formula respectively. One can now see why while our quantitative interpretation requires \(\lnot \varphi \equiv {1 - \varphi }\), robustness changes sign when negation is applied over a formula i.e. \(\rho (\lnot \varphi , w) = -\rho (\varphi , w)\) . Similarly to \(\varphi\) for STL in Boolean semantics, a \(\varphi\) formula may refer to an input signal \({\textbf{x}}\); apply a real-valued function f pointwise to the outputs its \(\varphi\) subformulas; be a temporal operator (\({\mathcal {U}}\), F, G); or, additionally, apply an aggregate function over the sliding window (\(On_{[a,b]}\)). We define it as:

$$\begin{aligned} \begin{aligned}&\llbracket x \rrbracket (t) = x(t) \qquad \qquad \llbracket On_{[a,b]}\psi \rrbracket (t) = \llbracket \psi \rrbracket ([t+a,t+b]) \\&\llbracket f(\varphi _1\dots \varphi _n) \rrbracket (t) = f(\llbracket \varphi _1 \rrbracket (t) \dots \llbracket \varphi _n \rrbracket (t) \end{aligned} \end{aligned}$$

We abuse the notation so that x is both a symbol referring to a component of an input signal and the corresponding real-valued function; similarly, f is both a function symbol and the corresponding function.

The necessity of expressing richer properties in STL motivates the incorporation of STL dialects to ParetoLib that extend the original set of operators with additional quantitative primitives. As previously sketched in the motivating examples, these predicates compute, for instance, the minimum/maximum values of a signal in a time window [3], the value of the derivative in a time point or the integral in a time interval [13], or counts the number of events in a sequence [5]. Section 5 will show a case study that requires the count operator for learning patterns in labelled electrocardiograms (ECG). The following grammar summarises the list of operators that are supported by ParetoLib 2.0:

$$\begin{aligned} \varphi : &= \;c_{\epsilon } \;{\mid }\;x\;{\mid }\;f(\varphi _{1} \cdots \varphi _{n} )\;{\mid }\;f_{ + }^{d} (\varphi )\;{\mid }\;f_{ - }^{d} (\varphi )\;{\mid }On_{{[t_{1} ,t_{2} ]}} \psi \;{\mid }\;\varphi _{1} {\mathcal{U}}_{{[t_{1} ,t_{2} ]}}^{d} \varphi _{2} \hfill \\ \psi \;: &= Min\;\varphi \;{\mid }Max\;\varphi \;{\mid }f^{i} \varphi \hfill \\ \end{aligned}$$

Min, Max. Operations like minimum/maximum or integrals aggregate the values for a time window. The semantics of a \(\psi\) formula is a function \(\llbracket \psi \rrbracket : ({\mathbb {R}} \cup -\infty )\times ({\mathbb {R}} \cup \infty ) \rightarrow {\mathbb {R}}\) from an interval of time with real lower bound to a dual value. A \(\psi\) formula is evaluated on an interval and does not have an output signal by itself. Instead, it supplies an aggregate operation that will be computed when evaluating the containing On formula or until formula. It should be possible to efficiently compute this aggregate operation over a sliding window, and it should preserve the chosen shape of signals.

Since we focus on piecewise-constant and piecewise-linear signals, the two operations that we can immediately offer are Min and Max, which can be efficiently computed over a sliding window using the algorithm of D. Lemire [14, 15], and preserve the piecewise-constant and piecewise-linear shapes.

$$\begin{aligned} \llbracket Min \, \varphi \rrbracket [t_1,t_2] = \texttt {min}_{[t_1,t_2]} \llbracket \varphi \rrbracket \qquad \llbracket Max \, \varphi \rrbracket [t_1,t_2] = \texttt {max}_{[t_1,t_2]} \llbracket \varphi \rrbracket \end{aligned}$$

At this point, we extend until operator in order to align with the quantitative semantics. For every time point t, we either associate an interval ending when \(\varphi _2\) becomes non-zero (i.e., starts holding); or we report that no suitable end point was found. When such interval exists, we evaluate \(\varphi _1\) on it. When the interval does not exist, we produce d. Formally:

$$\begin{aligned} \llbracket \varphi _1 {\mathcal {U}}^{d}_{[t_1,t_2]} \varphi _{2} \rrbracket (t) = \left\{ \begin{array}{lcl} \llbracket \varphi _1 \rrbracket _{[t']}, \text {if } \exists t' \in [t+t_1, t+t_2], \, \text {s.t.} \, \llbracket \varphi _2 \rrbracket (t') \ne 0 \\ d, \text {otherwise} \end{array} \right. \end{aligned}$$

Standard STL eventually and always operators can be expressed in the new language as follows:

$$\begin{aligned} F_{[t_1,t_2]} \varphi \equiv On_{[t_1,t_2]} Max \, \varphi{} & {}\;G_{[t_1,t_2]} \varphi \equiv On_{[t_1,t_2]} Min \, \varphi \end{aligned}$$

Finally, standard STL until operator (denoted as \({\mathcal {U}}^{STL}\)) is expressed as follows:

$$\begin{aligned} \varphi _1 {\mathcal {U}}^{STL}_{[t_1,t_2]} \varphi _{2}(t) \equiv (Min \, \varphi _1) \, {\mathcal {U}}^{0}_{[t_1,t_2]} \varphi _2 \end{aligned}$$

\(\epsilon\)-count (\(c_\epsilon\)). The \(\epsilon\)-count operator counts the number of positive events in a signal (i.e., the moments when a Boolean signal becomes true). The \(\epsilon\)-count operator is not explicitly available at the STL grammar level, but it can be used for counting the number of positive intervals in a Boolean signal via the OracleEpsSTLe wrapper in ParetoLib. The \(\epsilon\)-count operator grounds on the support of a Boolean signal \({\textbf{x}}\), denoted by supp(x), which is the topological closure of \(\{t \mid {\textbf{x}}(t) \ne 0\}\), that is, the smallest closed set that contains all the points where the signal is not false. The \(\epsilon\)-count operator, originally defined in [5], is inspired by the notions of \(\epsilon\)-separated sets and \(\epsilon\)-capacity proposed in [16]. The \(\epsilon\)-count operator contrasts to the Lebesgue’s measure of subsets of \({\mathbb {R}}\) which entails a small measure for a signal whose support is the disjoint union of many intervals of almost-null measure which are quite far apart. Formally speaking, the \(\epsilon\)-count operator is defined as:

Definition 1

(\(\epsilon\)-separated set and \(\epsilon\)-count) Given a Boolean signal \({\textbf{x}}\), a set S of reals is \(\epsilon\)-separated w.r.t. \({\textbf{x}}\) if \(S\subseteq \texttt {supp(}{\textbf{x}}{} \texttt {)}\) and for every \(t,t'\in S\) with \(t\ne t'\), it holds that \(|t-t'|\ge \epsilon\). The \(\epsilon\)-count of a signal \({\textbf{x}}\) is \(c_\epsilon ({\textbf{x}})= \texttt {max} \{\vert S \vert \mid S \text { is } \epsilon\)-separated w.r.t. \({\textbf{x}}\}\).

The \(\epsilon\)-count of a signal \({\textbf{x}}\) is determined in a greedy manner with the following recursive equations:

  • \(c_\epsilon ({\textbf{0}})=0\) when it is applied to the constant signal \({\textbf{0}}\), and

  • \(c_\epsilon ({\textbf{x}})=1+c_\epsilon ({\textbf{x}}')\) where \({\textbf{x}}'(t)=0\) if \(t<\epsilon + \texttt {min}(\texttt {supp(x)})\) and \({\textbf{x}}'(t)={\textbf{x}}(t)\) otherwise.

Finally, the \(\epsilon\)-count has some remaining properties:

  1. (1)

    The \(\epsilon\)-count is null iff it is applied to the constant signal \({\textbf{0}}\).

  2. (2)

    The \(\epsilon\)-count is increasing: if \({\textbf{x}}\le {\textbf{x}}'\) then \(c_\epsilon ({\textbf{x}})\le c_\epsilon ({\textbf{x}}')\).

  3. (3)

    The \(\epsilon\)-count satisfies a triangular inequality: \(c_\epsilon ({\textbf{x}} \vee {\textbf{x}}')\le c_\epsilon ({\textbf{x}})+ c_\epsilon ({\textbf{x}}')\).

Derivatives, Integrals. The derivatives and integrals are calculated using the following mathematical expressions, where \({\textbf{x}}\) stands for the complete signal, \({\textbf{x}}_{\tau }\) is the value of the signal at time \(\tau\), \(f^i_{[t_1,t2]}(\varphi )\)Footnote 2 is the integral for the time interval \([t_1, t_2]\) and \(f^d_{+}(\varphi )\) (\(f^d_{-}(\varphi )\)) represent the right (left) derivatives:

$$\begin{aligned} \llbracket f^d_{+}(\varphi ) \rrbracket&= \frac{d f(\varphi )}{dt^+}{} & {} \llbracket f^i(\varphi ) \rrbracket = \int f(\varphi _{\tau }) \delta \tau \\ \llbracket f^d_{-}(\varphi ) \rrbracket&= \frac{d f(\varphi )}{dt^-}{} & {} \llbracket f^i_{[t_1,t_2]}(\varphi ) \rrbracket = \int ^{t_2}_{t_1} f(\varphi _{\tau }) \delta \tau \end{aligned}$$

Due to the implicit signal discretization, \({\textbf{x}}_{\tau }\), the value of the input signal at time \(\tau\), is obtained as \({\textbf{x}}_{k \delta t}\) where k is the k-th term of the array, \(\delta t\) is the time step determined by the sampling frequency and \(k \delta t \le \tau < (k + 1) \delta t\). The quantitative measurements of derivatives and integrals are then discretized and approximated by the following equations:

$$\begin{aligned} \llbracket f^d_{+}({\textbf{x}}_{\tau }) \rrbracket&= \frac{f({\textbf{x}}_{(k + 1) \delta t}) - f({\textbf{x}}_{k \delta t})}{\delta t}{} & {} \llbracket f^i_{[a,b]}({\textbf{x}}) \rrbracket = \sum ^{k + b / \delta t - 1}_{k' = k + a / \delta t} f({\textbf{x}}_{k' \delta t}) \delta t \\ \llbracket f^d_{-}({\textbf{x}}_{\tau }) \rrbracket&= \frac{f({\textbf{x}}_{k \delta t}) - f({\textbf{x}}_{(k - 1) \delta t})}{\delta t}&\end{aligned}$$

4 Tool presentation

4.1 Internal structure

The internal structure of ParetoLib 2.0 is depicted in Fig. 6. The components that are highlighted in red represent the parts that are updated or completely new with respect to the previous releases of ParetoLib.

Fig. 6
figure 6

Internal structure of ParetoLib 2.0

The architecture of ParetoLib 2.0 is divided into three parts: the GUI, the STL monitor and the modules for mining the parameter valuations in parametric specifications. Depending on the type of STL specification the user introduces in the GUI (parametric vs non-parametric STL properties), ParetoLib directly forwards the query to the STL monitor or starts the mining procedure for extracting the validity domain. In any case, ParetoLib uses external STL monitors as oracles for answering STL queries. ParetoLib 2.0 continues to rely on top of StlEval [17], a C++ engine for evaluating extended Signal Temporal Logic (STL) specifications. ParetoLib interfaces to StlEval with a C API that includes the new quantitative operators (e.g, derivatives or integrals over the signals). Other STL monitors such AMT 2.0 [18] are also supported via Python code (replacing OracleSTLeLib by OracleSTL in Listings 1) or can be easily incorporated thanks to the modularity of the library. As the features of AMT 2.0 have not been enlarged since the previous release of ParetoLib, this STL monitor is omitted from the previous UML diagram for the sake of concision.

The functionalities for the internal modules of ParetoLib remain unchanged. These modules are responsible for learning the parameter valuations of parametric STL specifications. The mining algorithm converts the PSTL formulas into numerical STL instances that queries to StlEval [19]. The current version of ParetoLib optimises the performance of this process by adding support to multi core CPU and type annotations and Cython directives [20] to compile the Python modules into C code during the installation phase. The rest of the code is also updated in order to support recent Python versions and standards.

The graphical user interface (GUI) allows to interact both with the mining library for parametric STL specifications and indirectly with the StlEval tool for numerical STL properties. The GUI simplifies the interaction with StlEval, but experienced users can also run StlEval through the OS terminal for evaluating non-parametric STL specifications. Similarly, ParetoLib 2.0 can also be imported as a Python library and run in the Python console, as illustrated by the code in Listings 1.

The versatility of ParetoLib 2.0 adapts the complexity of the tool to the user level: inexperienced users can interact with STL monitors via the GUI, while programmers and software engineers can code scripts and use ParetoLib 2.0 as a Python library.

figure a

ParetoLib 2.0 provides an all-in-one bundle installer that ships the GUI as well as the DLL and binaries of StlEval for Windows and Linux. Except for the STL monitors (StlEval and AMT), ParetoLib is mainly written in Python. Hence, ParetoLib 2.0 becomes a cross-platform tool and can interact with the rest of the Python ecosystem (e.g., it can be easily imported as a library in a Jupyter Notebook).

4.2 StlEval

Given a non-parametric STL specification, StlEval implements efficient monitors that evaluate the properties on a real-time signal. StlEval has been upgraded in order to natively process new quantitative operators that count the number of events (\(\epsilon\)-count) on a signal or calculate the derivatives and integrals. Previously, StlEval already included operators for computing other quantitative properties such as the maximum/minimum values of a signal in a time window.

StlEval receives a finite representation of the continuous real-time signal in the format of a comma separated values (CSV) array, which is interpreted as a constant piecewise function. Each row of the CSV file has a pair \(\texttt {<timestamp, value>}\). Timestamps are integer values starting in 0, and the samples are floating numbers. Next, StlEval reads a text file with the STL specification. StlEval uses prefix notation, i.e., \((F_{[0,p_1]} (\varvec{\le } (abs \, (D\, {\textbf{x}}_0)) \, p_2))\).

4.3 Mining algorithms in ParetoLib

ParetoLib converts the problem of mining parametric STL equations into solving a multi criteria optimisation problem. Our tool provides three algorithms for computing the Pareto front that will return the valuations that satisfy the parametric STL expressions. All the methods replace the variables in PSTL equations by numerical STL instances that are query to an external STL monitor. Based on the Boolean answer the STL monitor replies, the mining algorithms guide the discovery of the validity domain.

The first method, named BBMJ19 [4], calculates a single Pareto front while the second method, named BDMJ20 [5], obtains the intersection of two Pareto fronts. The BBMJ19 method was originally defined in [21, 22] and shipped in the first release of ParetoLib in [4], while method BDMJ20 was incorporated in later versions. These algorithms assume that the STL properties are monotonic and use the concepts of upset, downset and Pareto front. Recently, method BMNN23 [7] has also been incorporated to ParetoLib, which allows for mining parametric STL specifications that do not require monotonicity assumptions of the validity domain. It provides two types of algorithms that partition the search space of a PSTL property into smaller boxes and compute the validity domain for the parameter valuations based on statistical constraints.

The computational costs of BBMJ19, BDMJ20 and BMNN23 depend on the accuracy requirements for the computation of the validity domain. For instance, BBMJ19 categorises the search space into (in)valid and unexplored area. The volume contained in the unexplored area decreases below \(\delta \%\) of the initial volume \(V_0\) in \({\mathcal {O}}({V_{0}/\delta }^{\kappa _{m}2^{m-1})}\) where m is the number of parameters (or dimensions) in a PSTL formula and \(\kappa _{m}\) is a constant number in the interval \([2m-4, 2^m -3]\) which represents the number of boxes created during the partitioning step.

BMNN23 relies on a partition strategy which can use a fixed or dynamic cell size. In the worst case, it costs \({\mathcal {O}}(M^m) \cdot {\mathcal {O}}(n)\), where m is the number of parameters (or dimensions) in a PSTL formula, and M is a constant that depends on the granularity of the cell size. As BMNN23 uses statistical properties for partitioning the validity domain, the method requires n input signals or samples.

Finally, the evaluation of non-parametric STL properties is commonly handled by STL monitors in linear time with respect to the length of the input signal.

Partial order on \({\mathbb {R}}^n\), Upset, Downset and Pareto Front. Given two vectors \(p, q\in {\mathbb {R}}^n\), vector p is lower than q, denoted by \(p\le q\), if \(\forall i, p_i\le q_i\). A set \({\overline{X}}\) is an upset if for all \(p, q\in {\mathbb {R}}^n\) such that \(p\le q\) if \(p\in {\overline{X}}\) then \(q\in {\overline{X}}\). A set \({\underline{X}}\) is a downset if for all \(p, q\in {\mathbb {R}}^n\) such that \(q\le p\) if \(p\in {\underline{X}}\) then \(q\in {\underline{X}}\). The boundary consisting of all the minimal elements of an upset (or all the maximal elements of a downset) is called a Pareto front in the field of multi-criteria optimisation. The box between two vectors \({\underline{x}}\) and \({\overline{x}}\) with \({\underline{x}}\le {\overline{x}}\) is \({\lfloor {\underline{x}},{\overline{x}}\rceil }=\{y\mid {\underline{x}}\le y\le {\overline{x}}\}\).

Both algorithms in ParetoLib deal with user-given boxes. Consider the box B from Fig. 7. The first Pareto front \(P_1\) separates the upset \(U_1\) and the downset \(D_1\) (Fig. 7a), while the second Pareto front \(P_2\) separates the upset \(U_2\) and the downset \(D_2\) (Fig. 7b). The algorithms rely on the existence of two oracle functions defined over the box B. Given a query point x contained in B, the first oracle function \(o_1(x)\) indicates whether x belongs to \(U_1\) or not (i.e., \(x\in U_1\)). The second oracle function \(o_2(x)\) works similarly and returns the membership of x to \(D_2\) (i.e., \(x\in D_2\)).

Assuming the existence of a Pareto front, the BBMJ19 algorithm takes an oracle and a box to approximately compute the upset and downset. In particular, \(U_1\) and \(D_1\) are approximated using oracle \(o_1\) and box B. Similarly, \(U_2\) and \(D_2\) are computed using oracle \(o_2\) and box B. Please note that BBMJ19 requires that the oracles are monotonic. In these examples, \(o_1\) is increasing: for \(p,q\in B\) with \(p\le q\), it applies that \(o_1(q)\ge o_1(p)\). Similarly, the oracle \(o_2\) is decreasing: for \(p,q\in B\) with \(p\le q\), it applies that \(o_2(q)\le o_2(p)\).

Conversely, the algorithm BDMJ20 computes the intersection of two Pareto fronts. This method applies when we are dealing with one increasing oracle and another decreasing oracle: it involves the intersection of the upset of the increasing oracle and the downset of the decreasing oracle. For example, it can approximately compute \(U_1\cap D_2\) (in Fig. 7c) using oracles \(o_1\), \(o_2\) and the box B. It computes approximately but directly, the intersection of \(U_1\) and \(D_2\). A less efficient way is to separately compute \(U_1\) and \(D_2\) and then intersect them.

Fig. 7
figure 7

Illustration for Pareto algorithms

4.4 Graphical user interface

The main GUI window (Fig. 8) is implemented using PyQt5. Extra libraries such as Matplotlib, Pandas and Seaborn help for displaying the time series or the parameter valuation results (Figs. 11, 12 and 13). The GUI starts when running the command python ParetoLib/GUI/GUI.py.

Fig. 8
figure 8

Main window of ParetoLib 2.0

The tool receives three input files: (1) A finite representation of the continuous signal, (2) A (parametric) STL specification, and (3) Optionally, the list of parameter names. Signal files are in CSV format, and the rest of input files are in textual format with.stl/param extension. These files must comply with the StlEval format. In future versions, we aim at adding a selector which chooses the monitoring tool directly in the user interface (i.e., StlEval or AMT).

Then, the user chooses the values of the parameter range (if any) and configure the execution in the option area. For instance, the user can select the type of interpolation, whether the STL specification is parametric or not, or the mining algorithm in case it is parametric (mining algorithms are BBMJ19 [4], BDMJ20 [5] or BMNN23 [7]). In case of running the BDMJ20 method, the user must provide two STL specification files: one for each oracle, as described in Sect. 4.3.

Running the evaluation of a non-parametric STL formula opens a pop-up window with a message saying if the property is satisfied or violated. If ParetoLib receives a parametric STL formula, it learns the parameter valuations and returns a window that plots the results (Figs. 11, 12 and 13).

Finally, the GUI offers a menu bar with useful options. For instance, it includes buttons for creating, loading and saving ParetoLib projects in a JSON file: this file will contain references to the input file paths and current configuration (e.g., mining method). In case that we desire to compare several Pareto fronts, the classification button will show the most characteristic element of the green region for each validity domain. This feature requires a list of validity domains in ParetoLib file format (see line 26 in Listing 1). Then, it internally computes the Haussdorf distance among the set of points.

4.5 Example

Fig. 9
figure 9

Decaying signal

Fig. 10
figure 10

Triangular signal with 4000 time units period

Using the new version of StlEval, ParetoLib can solve non-parametric and parametric formulas involving quantitative operators via the GUI or the Python terminal. As previously stated in the motivating examples in Sect. 2, some properties in cyber-physical systems are easier expressed in terms of these quantitative primitives than using other STL approaches. Let’s consider again two common scenarios: decaying signals (Fig. 9) and periodic signals (Fig. 10). Detecting whether the signal stabilises or not is equivalent to check that the wave is almost flat, i.e., the derivative bounds are near zero in the future. The usage of a parametric STL expression allows the inspection of the stabilisation property over a period of time and oscillation amplitudes:

$$\begin{aligned} F_{[0,p_1]} \vert D\, {\textbf{x}}_0 \vert \le p_2 \end{aligned}$$
(3)

The parameters in the PSTL equation are \(p_1\) and \(p_2\), \({\textbf{x}}_0\) is the first component of the input signal, D is the derivative operator and \(\vert \cdot \vert\) computes the absolute value.Footnote 3 The parameter \(p_1\) corresponds to the upper time bound of the sliding window, and \(p_2\) corresponds to the oscillation amplitude.

As a result, ParetoLib returns the parameter valuation (Fig. 11) that shows the combinations of parameters \(p_1\) and \(p_2\) that satisfy the property (green region) or falsify it (red region). Figure 12 shows the Boolean signal produced replacing \(p_1\) and \(p_2\) by 100 and 0.05 in Eq. (3). It also textually reports if the STL property is satisfied at time instant 0 (i.e., True).

Fig. 11
figure 11

Parameter valuation result for Eq. (3).

Fig. 12
figure 12

Boolean signal resulting from substitution of \((p_1, p_2) = (100, 0.05)\) in Eq. (3)

Similarly, an example of parametric expression involving integrals is Eq. (4). In this case, \(On_{[0,p_1]}\) refers to the integration interval for the I operator. If \(p_1\) is multiple of the periodicity of oscillation of the signal around \({\textbf{x}}_0(t)=0\), then the integral should also be zero. Figure 13 shows the parameter valuation for Eq. (4) when evaluated over the triangular signal (Fig. 10). Figure 14 shows the integral of the triangular signal (blue) over time. The Boolean signal (orange) is produced replacing \(p_1\) and \(p_2\) by 4000 and 50 in Eq. (4).

$$\begin{aligned} On_{[0,p_1]} I\, {\textbf{x}}_0 \ge 5000 - p_{2} \end{aligned}$$
(4)
Fig. 13
figure 13

Parameter valuation result for Eq. (4)

Fig. 14
figure 14

Boolean signal resulting from substitution of \((p_1, p_2) = (4000, 50)\) in Eq. (4)

The files for running the previous examples are located in the Test folder of the git repository of ParetoLib. These files include the STL specifications, the signals and a file containing the names of the parameter variables. Depending on the monitoring tool (i.e., StlEval or AMT), the file format of the STL specification and signal may change. Additional examples are located in the doc folder, as well as a video demo.

5 Case study

The goal of this section is to illustrate the new features of ParetoLib. In particular, we will show the benefits of the quantitative operators for simplifying the specification of STL properties as well as the computational speed up compared to the first release of ParetoLib.

In this case study, we design a disease detector for electrocardiograms (ECG) using parametric STL. Informally speaking, an ECG is composed of a sequence of approximately flat regions followed by a pulse or heat beat. If the patient suffers from arrhythmia or other kind of heart diseases, the pulse exhibits an anomalous shape. In the following experiments, we use the MIT-BIH Arrhythmia Database of Physionet [23, 24], a public library that contains several annotated ECG’s with thousands of pulses per sample. A portion of ECG 221 is depicted as in Fig. 15 where the signal (blue) and the annotations (orange) come from the database. The annotations for the normal peaks are modelled into a labelling signal that is 1 when a normal peak occurs and 0 everywhere else.

Fig. 15
figure 15

Excerpt from ECG221

Our aim is to develop a pattern predictor (i.e., classifier) that identifies normal peaks and mimic the labelling a doctor would manually make for an input ECG. The learning procedure presented in [5] relies on the \(\epsilon\)-count operator for mining a classifier that minimises the number of mismatches (either false positives, \(\texttt {f}_{+}\), or false negatives, \(\texttt {f}_{-}\)) with respect to the annotations of the samples in the training set. The inferred classifier consists of a PSTL formula and the solution of the parameter space. Obtaining a classifier that mimics a doctor labelling is hence translated into an optimisation problem that minimises both \(\texttt {f}_{+}\) and \(\texttt {f}_{-}\). The problem is solvable by the BDMJ20 method in Sect. 4.3 that deals with two, maybe opposing, optimisation criteria.

Firstly, let’s introduce some formal definitions and notation. A labelled signal \((s,\lambda _{s})\) is a pair of signal s and labelling signal \(\lambda _s\). We use \({\mathcal {S}}\) to denote the given set of labelled signals. We use IPPP to mean Increasing Parametric Pattern Predictor (defined formally in [5]). To put it in simple terms, increasing the parameter values for an IPPP makes the true predictions become more common. We aim at learning parameters p for an IPPP \(\Psi _{p}\) so that for every given labelled signal \((s,\lambda _{s})\), the labelling signals \(\Psi _{p}(s)\) and \(\lambda _{s}\) should match together as much as possible. We measure two kind of mismatches by measuring “how often” the two following signals are true. The false positive signal \(\lnot \lambda _{s}\wedge \Psi _{p}(s)\) indicates when the predictor predicts an occurrence when there is none. The false negative signal \(\lambda _{s}\wedge \lnot \Psi _{p}(s)\) indicates when the predictor misses an actual occurrence.

Given bounds \(\texttt {f}_{+}\), \(\texttt {f}_{-}\) on the allowed \(\epsilon\)-count of false positives and false negatives, we are interested in the following three sets:

$$\begin{aligned} \texttt {Dom+}(\Psi _{},{\mathcal {S}},\texttt {f}_{+})=\{p\mid \forall (s,\lambda _s)\in {\mathcal {S}}, c_\epsilon (\Psi _{p}(s) \wedge \lnot \lambda _{s})\le \texttt {f}_{+}\}, \end{aligned}$$
(5)
$$\begin{aligned} \texttt {Dom-}(\Psi _{},{\mathcal {S}},\texttt {f}_{-})=\{p\mid \forall (s,\lambda _s)\in {\mathcal {S}}, c_\epsilon (\lnot \Psi _{p}(s) \wedge \lambda _{s})\le \texttt {f}_{-}\}, \end{aligned}$$
(6)
$$\begin{aligned} \texttt {DomInter}(\Psi _{},{\mathcal {S}},\texttt {f}_{+},\texttt {f}_{-})=\texttt {Dom+}(\Psi _{},{\mathcal {S}},\texttt {f}_{+}) \cap \texttt {Dom-}(\Psi _{},{\mathcal {S}},\texttt {f}_{-}). \end{aligned}$$
(7)

For convenience, we call them respectively the positive, negative and intersection solution sets. In addition, we are interested in a relaxed version of the identification problem for false positive bounding, by tolerating a difference of \(\sigma\) time units in matching the labels. This can be done by replacing \(\lambda _s\) with the signalFootnote 4\(\mathop {\textrm{F}_{[-\sigma ,\sigma ]}}\lambda _s\) in (5). More concretely, the solution set of the corresponding \(\sigma\)-relaxed problem is:

$$\begin{aligned} \texttt {Dom+}^{\sigma }(\Psi _{},{\mathcal {S}},\texttt {f}_{+})=\{p\mid \forall (s,\lambda _s)\in {\mathcal {S}}, c_\epsilon (\Psi _{p}(s) \wedge \lnot \mathop {\textrm{F}_{[-\sigma ,\sigma ]}}\lambda _s)\le \texttt {f}_{+}\}. \end{aligned}$$

Hence, the corresponding relaxed version of the intersection solution set (7) is

$$\begin{aligned} \texttt {DomInter}^{\sigma }(\Psi _{},{\mathcal {S}},\texttt {f}_{+},\texttt {f}_{-}) =\texttt {Dom+}^{\sigma }(\Psi _{},{\mathcal {S}},\texttt {f}_{+})\cap \texttt {Dom-}(\Psi _{},{\mathcal {S}},\texttt {f}_{-}). \end{aligned}$$
(8)

Note that \(\texttt {Dom+}(\Psi _{},{\mathcal {S}},\texttt {f}_{+})\) and \(\texttt {Dom+}^{\sigma }(\Psi _{},{\mathcal {S}},\texttt {f}_{+})\) are downsets and \(\texttt {Dom-}(\Psi _{},{\mathcal {S}},\texttt {f}_{-})\) is an upset because \(\Psi _{}\) is increasing (recall definitions of upset and downset from Sect. 4.3). The sets \(\texttt {DomInter}(\Psi _{},{\mathcal {S}},\texttt {f}_{+},\texttt {f}_{-})\) and \(\texttt {DomInter}^{\sigma }(\Psi _{},{\mathcal {S}},\texttt {f}_{+},\texttt {f}_{-})\) are intersections of an upset and a downset, which we can compute from the intersection of two Pareto fronts using BDMJ20 algorithm.

It is also of great interest to compute the set of couples \((\texttt {f}_{+},\texttt {f}_{-})\), called set of feasible error bounds, for which a solution exists:

$$\begin{aligned} {\mathcal {P}}(\Psi _{},{\mathcal {S}})=\{(\texttt {f}_{+},\texttt {f}_{-})\mid \texttt {DomInter}^{\sigma }(\Psi _{},{\mathcal {S}},\texttt {f}_{+},\texttt {f}_{-})\ne \emptyset \}. \end{aligned}$$
(9)

The set \({\mathcal {P}}(\Psi _{},{\mathcal {S}})\) is an upset and its minimal elements form a Pareto front. We compute it via membership-queries for couples \((\texttt {f}_{+},\texttt {f}_{-})\). They are done via non-emptiness checking of \(\texttt {DomInter}^{\sigma }(\Psi _{},{\mathcal {S}},\texttt {f}_{+},\texttt {f}_{-})\) which is an easier problem than computing the whole set.

Equation (10) gives a simple and rough characterisation of a normal ECG peak using min/max operators of extended STL [3]. For the sake of readability, the syntax has been slightly simplified by replacing \(On_{[a,b]} Max \, {\textbf{x}}\) by \(Max_{[a,b]} \, {\textbf{x}}\). Equation (10) is equal to 1 if the maximum of \({\textbf{x}}\) on \([t-p_1,t+p_1]\) is above \(-p_3\), and its variation is within the bound \(p_2\) on \([t-c,t-p_1]\) and on \([t+p_1,t+c]\). The parameter domains are \(p_1\in [0,70]\), \(p_2\in [0,1]\) and \(p_3\in [-1,0]\). Here, \(c=70\) is a constant representing an upper limit on \(p_1\). Note that if one increases \(p_1\), \(p_2\) or \(p_3\), the property is easier to achieve. Alternatively, Eq. (11) characterises ECG pulses in terms of derivatives and integrals in a simpler and more readable way.

$$\begin{aligned} \Psi _{(p_1,p_2,p_3)}^{ch_1}({\textbf{x}})(t):= & {} ((Max_{[-c,-p_1]} \, {\textbf{x}} - Min_{[-c,-p_1]} \, {\textbf{x}}) \le p_2)~\wedge \nonumber \\{} & {} ((Max_{[p_1,c]} \, {\textbf{x}} - Min_{[p_1,c]} \, {\textbf{x}}) \le p_2)~\wedge \nonumber \\{} & {} ((Max_{[-p_1,p_1]} \, {\textbf{x}}) \ge -p_3) \end{aligned}$$
(10)
$$\begin{aligned} \Psi _{(p_1,p_2,p_3)}^{ch_2}({\textbf{x}})(t):= & {} (\vert D x \vert< 1) \ {\mathcal {U}}_{[0, p_{1}]} \ (I_{[0, p_{2}]} x) < p_{3} \end{aligned}$$
(11)

For ECG 221, the predictor in Eq. (10) can match the labelling with no false negatives (\(\texttt {f}_{-}=0\)) and only a single false positive (\(\texttt {f}_{+}=1\)): it identifies the blue peak in the right hand side of Fig. 15 as a valid pulse. We denote the singleton signal set containing the labelled ECG 221 signal as \({\mathcal {S}}_{221}\). Table 1 shows the execution time that ParetoLib requires for computing the solution set (\(\texttt {DomInter}^{\sigma }(\Psi ^{ch_1},{\mathcal {S}}_{221},\texttt {f}_{+}=1,\texttt {f}_{-}=0))\) of the parametric space \((p_1,p_2,p_3)\) with different accuracy. The value of \(V_\delta\) represents the percentage of the parametric space that remains unexplored. For instance, the green area in Fig. 16 represents the solution set of the parameters when the unexplored area is less than \(0.01\%\).

The Cython code is around \(3\%\) faster than pure Python: the evaluation of the STL expressions, which is the most time consuming part of the mining method, was already externalised to StlEval in previous versions of ParetoLib. Additionally, ParetoLib 2.0 supports the parallel execution of the BDMJ20 method in multicore CPU’s, while ParetoLib 1.0 did not. The experiments are run in a Intel(R) Core(TM) i5-3570K CPU @3.40GHz 16GB, Ubuntu 22.04 and Python 3.10.

Table 1 Execution time (seconds) of the solution set of the parametric space for different scenarios
Fig. 16
figure 16

Solution set of the parametric space \(\texttt {DomInter}^{\sigma }(\Psi ^{ch_1},{\mathcal {S}}_{221},\texttt {f}_{+}=1,\texttt {f}_{-}=0))\) for ECG221 (\(V_\delta =0.01\%\))

Equation (10) is also evaluated for labelling pulses on ECG 100 and ECG 123 with different success. We denote the singleton signal sets containing the labelled ECG 100 and ECG 123 signals as \({\mathcal {S}}_{100}\) and \({\mathcal {S}}_{123}\) respectively. Table 2 compares the number of mismatches, either false positives or false negatives, between the original labelling and the one produced by Eq. (10). For the particular case of ECG 100, our solution provides 33 false positives. According to the Pareto front (corresponding to the upset \({\mathcal {P}}(\Psi ^{ch_1},{\mathcal {S}}_{100})\)) in Fig. 17, it is impossible to obtain a better solution set using the current specification. This issue does not come from a wrong definition of peak, but from pulses that are separated further than usual. See the first pulse after 2000 time units in Fig. 18. It is called an Atrial Premature Beat which should not be considered a normal peak. Equation 12 enriches Eq. (10) with explicit information about the distance between peaks, which improves the accuracy of our peak detector and reduces the number of errors to 3 false positives and 1 false negative. peak is an alias for Eq. (10).

Table 2 Execution time (seconds) and number of \(\texttt {f}_{-}\)/\(\texttt {f}_{+}\) of the solution set of the parametric space for different ECGs
Fig. 17
figure 17

Pareto front for ECG100 using Eq. (10)

Fig. 18
figure 18

Excerpt from ECG100

$$\begin{aligned} \Psi _{(p_1,p_2,p_3,p_4)}^{ch_3}({\textbf{x}})(t):= peak{(p_1,p_2,p_3)} ~\wedge F_{[0,p_4]} \, peak{(p_1,p_2,p_3)} \end{aligned}$$
(12)

6 Related work

Similar tools for offline monitoring of cyber-physical systems are AMT 2.0 [18], S-Taliro [25] and Breach [26]. AMT 2.0 is a Java tool while S-Taliro and Breach are MATLAB/Simulink toolboxes. AMT 2.0 analyses input traces with extended Signal Temporal Logic (xSTL), which combines STL and Timed Regular Expressions (TRE). On the other hand, S-Taliro and Breach include an explicit model of the cyber-physical system for simulating traces. S-Taliro is specialised in falsification of temporal logic properties by finding trajectories with minimum robustness. Breach allows the exhaustive inspection of the cyber-physical model by systematically varying configuration parameters. Next, py-stl [27, 28] implements a similar approach to our BBMJ19 method for mining parametric STL equations. The authors of py-stl apply it to compute a family of distance metrics for a set of monotonic specifications that are mined from time-series learning [29]. Finally, MiniPaSTeL [30] implements the BMNN23 method for mining STL parameters [7]. It uses RTAMT [31] as monitoring tool.

However, these tools include neither the quantitative operators nor a full support for the specification mining methods presented in this paper. Additionally, ParetoLib 2.0 has an adapter for externalising the run of non-parametric STL queries in AMT 2.0 and is also prepared for connecting with the Matlab environment. Similarly to py-stl, ParetoLib 2.0 supports the comparison of the inferred validity domains for mined PSTL specifications. Our tool can compute the Haussdorf distances among a set of validity domains and return the characteristic point for each of them.

Recently, tools for online monitoring of systems that cannot be statically analysed have appeared. RTAMT [31] supports the online diagnosis with STL and IA-STL, an interface-aware extension for defining interface properties about input/output signals of a cyber-physical systems. Finally, RTLola [32] (previously named StreamLab) and TeSSLa [33] complete the state of the art. These tools are focused on evaluating real-time streams instead of signals.

7 Conclusion

This paper introduces the new features of ParetoLib 2.0, a Python tool for the evaluation and parameter synthesis of Signal Temporal Logic specifications (STL). The main changes w.r.t. the previous version of the tool consist of (1) A graphical user interface that simplifies the usage to the end users, (2) The support of additional quantitative operators that involve counting of events (\(\epsilon\)-count), trends (derivatives), or accumulations (integrals), and (3) The implementation of new mining methods. Besides, we have optimised the performance of the library for mining the parameter valuations of parametric STL specifications by completing the multi core support and compiling the kernel modules into C code. The compilation of the internal Python modules into C code is transparent to the end users as the transformation is automatically executed during the installation of the Python library. Finally, we introduce the Haussdorf distance in order to compare the inference of validity domains of PSTL specifications.

As future work, we propose to include more options in the graphical user interface that are already available in ParetoLib as command-line options (e.g., the selection of parallel computations or the STL engine). On the other hand, we want to include other types of interpolation beyond the constant pairwise (e.g. linear). We plan to add new logic operators such as the probability operator \(\textbf{Pr}_{\sim \lambda }\varphi\), where \(\sim \in \{<, \le , \ge , >\}\) and \(\lambda \in [0, 1]\). Finally, we will implement a natural language processor for writing STL specifications in natural language.