Mining of Extended Signal Temporal Logic Specifications with ParetoLib 2.0

This paper presents ParetoLib 2.0, a Python tool for offline monitoring and specification mining of cyber-physical systems. ParetoLib 2.0 uses Signal Temporal Logic (STL) as the formalism for specifying properties on time series. ParetoLib 2.0 builds upon other tools for evaluating and mining STL expressions, and extends them with new functionalities. In particular, ParetoLib 2.0 implements a novel mining algorithm that relies on a set of new quantitative operators for trace analysis in STL as well as an original graphical user interface. Finally, the performance is optimized with respect to previous releases of the tool.


Introduction
Cyber-physical systems are complex environments that combine physical devices (i.e., sensors and actuators) with a software controller.The ubiquity of these systems and dangers associated with their failure require the implementation of mechanisms to monitor, verify and guarantee their correct behaviour.
This paper presents ParetoLib 2.0, a Python tool for offline monitoring of cyber-physical systems.ParetoLib 2.0 allows qualitative and quantitative analysis of real-time signals using specification in an extension of Signal Temporal Logic (STL) [1].Additionally, ParetoLib 2.0 supports mining parameter valuations of parametric STL (PSTL) [2] properties from trace examples (i.e., a variant of STL that allows parameters in addition to numeric constants).
ParetoLib 2.0 gathers the contributions presented in [3][4][5] and extends previous versions of ParetoLib [6] with the following additions.First, we provide a graphical user interface that helps the end users efficiently interact with the tools and interpret the results.Second, we support additional quantitative STL operators allowing to express a richer set of properties involving counting of events (ϵ-count), trends (derivatives), or accumulations (integrals).Third, ParetoLib includes recent algorithms for mining parametric STL specifications that do not require monotonicity assumptions of the validity domain [7].Finally, we have optimised the tool performance by adding multicore support, type annotations and directives to compile the Python modules into C code.
The paper is organised as follows: Section 2 motivates the extension of STL with some examples that are complicated to express without the new STL operators.Section 3 introduces the STL specification language and the quantitative extensions.Next, Section 4 presents the technical aspects of ParetoLib 2.0 and details the new contributions.Section 5 presents a case study that uses the new quantitative operators.Section 6 compares ParetoLib 2.0 to similar tools.Finally, Section 7 gathers the conclusions and sketches the future work.

Motivating Examples
Before formally defining the STL language, let us look at some examples of properties that we would like to express.In particular, we look at properties that motivated the development of more expressive and harder to monitor logics.

Example 1 (Stabilisation)
The first interesting property is stabilisation around a value that is not known in advance, e.g., "x stays within +/ − 0.05 units of some value for at least 200 time units".It is tempting to formalise this property using existential quantification "there exists a threshold v, such that. . .", which is possible with first-order logic of signals (and was one of its motivational properties [8]), but it is actually not necessary.Instead, we can compute the minimum and maximum of x over the next 200 time units and compare their distance to 0.1 = 2 • 0.05.In some imaginary language, we could  write max [0,200] x − min [0,200] x ≤ 0.1.At this point we propose to separate the aggregate operators from the operator that defines the temporal window, which will be useful later, when the until operator will define a window of variable width.We use the operator On [a,b] to define the temporal window of constant width and the operators M in and M ax (capitalised) to denote the minimum and maximum over the previously defined window.Signal x stabilises within 0.05 units of an unknown value for 200 time units: On [0,200] M ax x − On [0,200] M in x ≤ 0.1 Figure 1 shows an example of a signal x(t) (red) performing damped oscillation with the period of 250 time units.Blue and green curves are the maximum and the minimum of x over a siding window [t, t + 200].Finally, the orange Boolean signal (its y scale is on the right) evaluates to true (i.e., y = 1) when the maximum and minimum of x over the next 200 time units are within 0.1.
Example 2 (Local Maximum) Consider the property: "the current value of x is a minimum or maximum in some neighbourhood of current time point".Previously, a similar property became a motivation to extend STL with freeze quantifiers [9], but we can also express it by comparing the value of a signal with some aggregate information about its neighbourhood, which we can do similarly to the previous example.Current value of x is a local maximum on the interval [0, 85] relative to the current time.
x ≥ On [0,85] M ax x Figure 2 shows an example of a sine wave x(t) (red) with the period of 250 time units.Blue curve is the maximum x over a siding window [t, t+85].The orange Boolean signal evaluates to true when the current value of x is a maximum for the next 85 time units.Another way of detecting the moments when the signal reaches a local minimum or maximum, regardless the actual value of x, consists on expressing the property in terms of derivatives (D operator).Local maximum and minimum are reached when derivatives are equal to zero, so the following expression will draw a pulse in the surroundings of local peaks in the orange Boolean signal every 125 time units (not shown in Figure 2).
Example 3 (Stabilisation Contd.)We want to be able to assert that x becomes stable around some value not for a fixed duration, but until some signal q becomes true.We will be able to do this with our version of until operator.Signal x is stable within 0.05 units of an unknown value until q becomes true: Intuitively, for a given time point, we want the monitor to find the closest future time point, where q holds and compute M in and M ax of x over the resulting interval.Note that this property cannot be easily monitored in the framework of "STL with pre-processing", since it requires the monitor to compute M in and M ax over a sliding window of variable width, which depends on the satisfaction signal of q.
Example 4 (Linear Increase) At this point, we can assert x to follow a more complex shape, for example, to increase or decrease with a given slope.Let T denote an auxiliary signal that linearly increases with rate 1 (like a clock of a timed automaton), i.e. we define T (t) = t; this example works as well for T (t) = t + c, where c is a constant.To specify that x increases with the rate 2.5, we assert that the distance from x to 2.5 • T stays within some bounds.Signal x increases approximately with slope 2.5 during the next 100 time units: The previous expression can be rewritten in terms of derivatives in a much more simpler way: Example 5 (Integral) Let us consider the following formula: Consider the piece-wise constant signal w 10 (Figure 3) with sample values 0, 9 and 18 at times 0, 1 and 2 respectively and it ends at time 10.Let us now analytically compute the solution for the parametric identification problem for the above formula and the signal w 10 .The value of the parameter p 1 varies between 0 and 10 and the parameter p 2 lies between 0 and 250 and c 2 equals 250.We can analytically compute the validity domain by considering 10 cases corresponding to the 10 segments of w 10 .We give below the first three parts of the disjunction of constraint systems for representing the validity domain The full validity domain, named v f , is computed as the union of all the regions and shown in green colour in Figure 4.The validity domain for (¬(¬φ f )) will be (¬(¬v f )).If we naively compute this series of negations over semi-linear domains then the time complexity for obtaining the exact solution might be high.We implemented in C++ the negation operation on semi-linear domains using Parma Polyhedra Library [10].We tested in addition to w 10 using piece-wise constant signals w 100 and w 1000 each having 100 and 1000 segments respectively.We computed the validity domains by treating them as semi-linear domains for w 10 , w 100 and w 1000 for (¬(¬φ f )) while suitably setting the value of constant c 2 the ranges of values of parameters p 1 and p 2 .We observed computation times between 30 and 50 milliseconds for signal w 10 and 57264 milliseconds for signal w 100 in a conventional laptop.For w 1000 the operations timed out after the one hour limit we set.We see that the computation time increases steeply with the size of the signal and becomes prohibitively high for signal w 1000 which is of reasonable size.
On the other hand, if we reason in terms of membership queries to an oracle that guides the computation of the validity domains, then the previous (¬(¬v f )) expression for signal w 10 could be approximated efficiently (see Figure 5) in 145 milliseconds.Our approach requires 1997 and 75932 milliseconds respectively for approximating more than the 99% of the validity domains of signals w 100 and w 1000 .The C++ and ParetoLib code, as well as the w 10 , w 100 and w 1000 signals in CSV format are publicly available 1 .
More examples of properties that require quantitative STL operators can be found in the literature.For instance, the monitoring of the accumulated electricity consumption on smart meters is easily described with integrals over a time window [11].The rest of the paper includes toy examples and a case study that amplifies the necessity of the new features we present.Further sections illustrate the methods we implement for approximating the validity domain we obtained in the last example, and outperform other monitoring tools that compute exact solution.
3 Extended Signal Temporal Logic

Boolean semantics
Signal Temporal Logic (STL) [1] focuses on specifying properties of continuoustime signals that represent the evolution of attributes of a cyber-physical system.Formally, a signal is a function x : T → R n , where the time domain T = [T start , T end ] is a closed real interval, and the value |x| = T end − T start is the duration of the signal.We refer to signal components using their own letters: x, y, z . . .∈ T → R.
Then, the grammar of STL can be defined as follows: P ∈ T → {true, false} is a predicate over the components of the input signal; U is the temporal operator until ; φ 1 U [t1,t2] φ 2 asserts that φ 2 becomes true in the future, with the delay between t 1 and t 2 , and φ 1 must hold until that point.
The semantics of an STL formula φ with respect to a signal x is its satisfaction signal ϕ ∈ T → {true, false}; for every time point t, ϕ (t) tells whether or not ϕ holds at that time point.That is, (x, t) |= φ ⇐⇒ φ (t).
Here, we use the non-strict semantics of until that requires the existence of a time point where both φ 1 and φ 2 hold.Somewhat more commonly used temporal operators F (eventually) and G (always) can be defined in terms of until : ] φ asserts that φ becomes true with a delay between t 1 and t 2 ; and G [t1,t2] φ asserts that for every time point between t 1 and t 2 time points in the future, φ holds.
Note that the atomic propositions of STL are predicate signals; they are assumed to be produced from sensor readings of some cyber-physical system, but the logic does not specify how exactly this happens.The satisfaction signals are piecewise-constant (since the range is {true, false}), and this is the only kind of signals STL monitors need to work with.

Quantitative semantics: new additions to STL
STL also admits robustness semantic that measures how far is an observed behaviour from satisfying/violating a specification.In this case, the semantics of a formula φ is interpreted as a piecewise-constant or piecewise-linear function from real-time, i.e., φ formula is a function φ : T → R. Thus, φ has real-valued switching points.This feature facilitates the inclusion of quantitative operators while treating real numbers 0 and 1 as Booleans (i.e., 0 ≡ false, 1 ≡ true).The quantitative interpretation of a φ formula involves that Boolean expressions are rewritten in terms of the composition of min/max operators: This quantitative interpretation should not be confused with the quantitative semantics for STL called space robustness (known popularly as simply robustness) defined in [12].This robustness denoted as ρ indicates by how much margin a given signal w satisfies an STL formula φ.It follows that positive and negative robustness values correspond to satisfaction and violation of the formula respectively.One can now see why while our quantitative interpretation requires ¬φ ≡ 1 − φ, robustness changes sign when negation is applied over a formula i.e. ρ(¬φ, w) = −ρ(φ, w) .Similarly to φ for STL in Boolean semantics, a φ formula may refer to an input signal x; apply a real-valued function f pointwise to the outputs its φ subformulas; be a temporal operator (U, F , G); or, additionally, apply an aggregate function over the sliding window (On [a,b] ).We define it as: We abuse the notation so that x is both a symbol referring to a component of an input signal and the corresponding real-valued function; similarly, f is both a function symbol and the corresponding function.
The necessity of expressing richer properties in STL motivates the incorporation of STL dialects to ParetoLib that extend the original set of operators with additional quantitative primitives.As previously sketched in the motivating examples, these predicates compute, for instance, the minimum/maximum values of a signal in a time window [3], the value of the derivative in a time point or the integral in a time interval [13], or counts the number of events in a sequence [5].Section 5 will show a case study that requires the count operator for learning patterns in labelled electrocardiograms (ECG).The following grammar summarises the list of operators that are supported by ParetoLib 2.0: Min, Max.Operations like minimum/maximum or integrals aggregate the values for a time window.The semantics of a ψ formula is a function ψ an interval of time with real lower bound to a dual value.A ψ formula is evaluated on an interval and does not have an output signal by itself.Instead, it supplies an aggregate operation that will be computed when evaluating the containing On formula or until formula.It should be possible to efficiently compute this aggregate operation over a sliding window, and it should preserve the chosen shape of signals.
Since we focus on piecewise-constant and piecewise-linear signals, the two operations that we can immediately offer are M in and M ax, which can be efficiently computed over a sliding window using the algorithm of D. Lemire [14,15], and preserve the piecewise-constant and piecewise-linear shapes.
At this point, we extend until operator in order to align with the quantitative semantics.For every time point t, we either associate an interval ending when φ 2 becomes non-zero (i.e., starts holding); or we report that no suitable end point was found.When such interval exists, we evaluate φ 1 on it.When the interval does not exist, we produce d.Formally: Standard STL eventually and always operators can be expressed in the new language as follows: Finally, standard STL until operator (denoted as U ST L ) is expressed as follows: The ϵ-count operator counts the number of positive events in a signal (i.e., the moments when a Boolean signal becomes true).The ϵ-count operator is not explicitly available at the STL grammar level, but it can be used for counting the number of positive intervals in a Boolean signal via the OracleEpsSTLe wrapper in ParetoLib.The ϵ-count operator grounds on the support of a Boolean signal x, denoted by supp(x), which is the topological closure of {t | x(t) ̸ = 0}, that is, the smallest closed set that contains all the points where the signal is not false.The ϵ-count operator, originally defined in [5], is inspired by the notions of ϵ-separated sets and ϵ-capacity proposed in [16].The ϵ-count operator contrasts to the Lebesgue's measure of subsets of R which entails a small measure for a signal whose support is the disjoint union of many intervals of almost-null measure which are quite far apart.Formally speaking, the ϵ-count operator is defined as: Definition 1 (ϵ-separated set and ϵ-count) Given a Boolean signal x, a set S of reals is ϵ-separated w.r.t.x if S ⊆ supp(x) and for every t, t ′ ∈ S with t ̸ = t ′ , it holds that The ϵ-count of a signal x is determined in a greedy manner with the following recursive equations: c ϵ (0) = 0 when it is applied to the constant signal 0, and c ϵ (x) = 1 + c ϵ (x ′ ) where x ′ (t) = 0 if t < ϵ + min(supp(x)) and x ′ (t) = x(t) otherwise.
Finally, the ϵ-count has some remaining properties: Derivatives, Integrals.The derivatives and integrals are calculated using the following mathematical expressions, where x stands for the complete signal, x τ is the value of the signal at time τ , ) represent the right (left) derivatives: Due to the implicit signal discretization, x τ , the value of the input signal at time τ , is obtained as x kδt where k is the k-th term of the array, δt is the time step determined by the sampling frequency and kδt ≤ τ < (k + 1)δt.The quantitative measurements of derivatives and integrals are then discretized and approximated by the following equations: The internal structure of ParetoLib 2.0 is depicted in Figure 6.The components that are highlighted in red represent the parts that are updated or completely new with respect to the previous releases of ParetoLib.The architecture of ParetoLib 2.0 is divided into three parts: the GUI, the STL monitor and the modules for mining the parameter valuations in parametric specifications.Depending on the type of STL specification the user introduces in the GUI (parametric vs non-parametric STL properties), Pare-toLib directly forwards the query to the STL monitor or starts the mining The functionalities for the internal modules of ParetoLib remain unchanged.These modules are responsible for learning the parameter valuations of parametric STL specifications.The mining algorithm converts the PSTL formulas into numerical STL instances that queries to StlEval [19].The current version of ParetoLib optimises the performance of this process by adding support to multi core CPU and type annotations and Cython directives [20] to compile the Python modules into C code during the installation phase.The rest of the code is also updated in order to support recent Python versions and standards.
The graphical user interface (GUI) allows to interact both with the mining library for parametric STL specifications and indirectly with the StlEval tool for numerical STL properties.The GUI simplifies the interaction with StlEval, but experienced users can also run StlEval through the OS terminal for evaluating non-parametric STL specifications.Similarly, ParetoLib 2.0 can also be imported as a Python library and run in the Python console, as illustrated by the code in Listings 1.
The versatility of ParetoLib 2.0 adapts the complexity of the tool to the user level: inexperienced users can interact with STL monitors via the GUI, while programmers and software engineers can code scripts and use ParetoLib 2.0 as a Python library.ParetoLib 2.0 provides an all-in-one bundle installer that ships the GUI as well as the DLL and binaries of StlEval for Windows and Linux.Except for the STL monitors (StlEval and AMT), ParetoLib is mainly written in Python.Hence, ParetoLib 2.0 becomes a cross-platform tool and can interact with the rest of the Python ecosystem (e.g., it can be easily imported as a library in a Jupyter Notebook).

StlEval
Given a non-parametric STL specification, StlEval implements efficient monitors that evaluate the properties on a real-time signal.StlEval has been upgraded in order to natively process new quantitative operators that count the number of events (ϵ-count) on a signal or calculate the derivatives and integrals.Previously, StlEval already included operators for computing other quantitative properties such as the maximum/minimum values of a signal in a time window.
StlEval receives a finite representation of the continuous real-time signal in the format of a comma separated values (CSV) array, which is interpreted as a constant piecewise function.Each row of the CSV file has a pair <timestamp, value>.Timestamps are integer values starting in 0, and the samples are floating numbers.Next, StlEval reads a text file with the STL specification.StlEval uses prefix notation, i.e., (F [0,p1] (≤ (abs (D x 0 )) p 2 )).

Mining algorithms in ParetoLib
ParetoLib converts the problem of mining parametric STL equations into solving a multi criteria optimisation problem.Our tool provides three algorithms for computing the Pareto front that will return the valuations that satisfy the parametric STL expressions.All the methods replace the variables in PSTL equations by numerical STL instances that are query to an external STL monitor.Based on the Boolean answer the STL monitor replies, the mining algorithms guide the discovery of the validity domain.
The first method, named BBMJ19 [4], calculates a single Pareto front while the second method, named BDMJ20 [5], obtains the intersection of two Pareto fronts.The BBMJ19 method was originally defined in [21,22] and shipped in the first release of ParetoLib in [4], while method BDMJ20 was incorporated in later versions.These algorithms assume that the STL properties are monotonic and use the concepts of upset, downset and Pareto front.Recently, method BMNN23 [7] has also been incorporated to ParetoLib, which allows for mining parametric STL specifications that do not require monotonicity assumptions of the validity domain.It provides two types of algorithms that partition the search space of a PSTL property into smaller boxes and compute the validity domain for the parameter valuations based on statistical constraints.
The computational costs of BBMJ19, BDMJ20 and BMNN23 depend on the accuracy requirements for the computation of the validity domain.For instance, BBMJ19 categorises the search space into (in)valid and unexplored area.The volume contained in the unexplored area decreases below δ% of the initial volume where m is the number of parameters (or dimensions) in a PSTL formula and κ m is a constant number in the interval [2m − 4, 2 m − 3] which represents the number of boxes created during the partitioning step.BMNN23 relies on a partition strategy which can use a fixed or dynamic cell size.In the worst case, it costs O(M m ) • O(n), where m is the number of parameters (or dimensions) in a PSTL formula, and M is a constant that depends on the granularity of the cell size.As BMNN23 uses statistical properties for partitioning the validity domain, the method requires n input signals or samples.
Finally, the evaluation of non-parametric STL properties is commonly handled by STL monitors in linear time with respect to the length of the input signal.
Partial order on R n , Upset, Downset and Pareto Front.Given two vectors p, q ∈ R n , vector p is lower than q, denoted by p ≤ q, if ∀i, The boundary consisting of all the minimal elements of an upset (or all the maximal elements of a downset) is called a Pareto front in the field of multicriteria optimisation.The box between two vectors x and x with x ≤ x is ⌊x, x⌉ = {y | x ≤ y ≤ x}.
Both algorithms in ParetoLib deal with user-given boxes.Consider the box B from Figure 7.The first Pareto front P 1 separates the upset U 1 and the downset D 1 (Figure 7a), while the second Pareto front P 2 separates the upset U 2 and the downset D 2 (Figure 7b).The algorithms rely on the existence of two oracle functions defined over the box B. Given a query point x contained Assuming the existence of a Pareto front, the BBMJ19 algorithm takes an oracle and a box to approximately compute the upset and downset.In particular, U 1 and D 1 are approximated using oracle o 1 and box B. Similarly, U 2 and D 2 are computed using oracle o 2 and box B. Please note that BBMJ19 requires that the oracles are monotonic.In these examples, o 1 is increasing: for p, q ∈ B with p ≤ q, it applies that o 1 (q) ≥ o 1 (p).Similarly, the oracle o 2 is decreasing: for p, q ∈ B with p ≤ q, it applies that o 2 (q) ≤ o 2 (p).
Conversely, the algorithm BDMJ20 computes the intersection of two Pareto fronts.This method applies when we are dealing with one increasing oracle and another decreasing oracle: it involves the intersection of the upset of the increasing oracle and the downset of the decreasing oracle.For example, it can approximately compute U 1 ∩ D 2 (in Figure 7c) using oracles o 1 , o 2 and the box B. It computes approximately but directly, the intersection of U 1 and D 2 .A less efficient way is to separately compute U 1 and D 2 and then intersect them.

Graphical User Interface
The main GUI window (Figure 8) is implemented using PyQt5.Extra libraries such as Matplotlib, Pandas and Seaborn help for displaying the time series or the parameter valuation results (Figures 11-13).The GUI starts when running the command python ParetoLib/GUI/GUI.py.
The tool receives three input files: 1) a finite representation of the continuous signal, 2) a (parametric) STL specification, and 3) optionally, the list of parameter names.Signal files are in CSV format, and the rest of input files are Then, the user chooses the values of the parameter range (if any) and configure the execution in the option area.For instance, the user can select the type of interpolation, whether the STL specification is parametric or not, or the mining algorithm in case it is parametric (mining algorithms are BBMJ19 [4], BDMJ20 [5] or BMNN23 [7]).In case of running the BDMJ20 method, the user must provide two STL specification files: one for each oracle, as described in Sub-section 4.3.
Running the evaluation of a non-parametric STL formula opens a pop-up window with a message saying if the property is satisfied or violated.If Pare-toLib receives a parametric STL formula, it learns the parameter valuations and returns a window that plots the results (Figures 11-13).
Finally, the GUI offers a menu bar with useful options.For instance, it includes buttons for creating, loading and saving ParetoLib projects in a JSON file: this file will contain references to the input file paths and current configuration (e.g., mining method).In case that we desire to compare several Pareto fronts, the classification button will show the most characteristic element of the green region for each validity domain.This feature requires a list of validity domains in ParetoLib file format (see line 26 in Listing 1).Then, it internally computes the Haussdorf distance among the set of points.

Example
Using the new version of StlEval, ParetoLib can solve non-parametric and parametric formulas involving quantitative operators via the GUI or the Python terminal.As previously stated in the motivating examples in Section  2, some properties in cyber-physical systems are easier expressed in terms of these quantitative primitives than using other STL approaches.Let's consider again two common scenarios: decaying signals (Figure 9) and periodic signals (Figure 10).Detecting whether the signal stabilises or not is equivalent to check that the wave is almost flat, i.e., the derivative bounds are near zero in the future.The usage of a parametric STL expression allows the inspection of the stabilisation property over a period of time and oscillation amplitudes: The parameters in the PSTL equation are p 1 and p 2 , x 0 is the first component of the input signal, D is the derivative operator and | • | computes the absolute value3 .The parameter p 1 corresponds to the upper time bound of the sliding window, and p 2 corresponds to the oscillation amplitude.As a result, ParetoLib returns the parameter valuation (Figure 11) that shows the combinations of parameters p 1 and p 2 that satisfy the property (green region) or falsify it (red region).Figure 12 shows the Boolean signal produced replacing p 1 and p 2 by 100 and 0.05 in Equation 3. It also textually reports if the STL property is satisfied at time instant 0 (i.e., True).
Similarly, an example of parametric expression involving integrals is Equation 4. In this case, On [0,p1] refers to the integration interval for the I operator.If p 1 is multiple of the periodicity of oscillation of the signal around x 0 (t) = 0, then the integral should also be zero.Figure 13 shows the parameter valuation for Equation 4 when evaluated over the triangular signal (Figure 10). Figure 14 shows the integral of the triangular signal (blue) over time.The Boolean signal (orange) is produced replacing p 1 and p 2 by 4000 and 50 in Equation 4.
The files for running the previous examples are located in the Test folder of the git repository of ParetoLib.These files include the STL specifications, the signals and a file containing the names of the parameter variables.Depending on the monitoring tool (i.e., StlEval or AMT), the file format of the STL specification and signal may change.Additional examples are located in the doc folder, as well as a video demo.

Case Study
The goal of this section is to illustrate the new features of ParetoLib.In particular, we will show the benefits of the quantitative operators for simplifying  4Fig.14: Boolean signal resulting from substitution of (p 1 , p 2 ) = (4000, 50) in Equation 4the specification of STL properties as well as the computational speed up compared to the first release of ParetoLib.
In this case study, we design a disease detector for electrocardiograms (ECG) using parametric STL.Informally speaking, an ECG is composed of a sequence of approximately flat regions followed by a pulse or heat beat.If the patient suffers from arrhythmia or other kind of heart diseases, the pulse exhibits an anomalous shape.In the following experiments, we use the MIT-BIH Arrhythmia Database of Physionet [23,24], a public library that contains several annotated ECG's with thousands of pulses per sample.A portion of ECG 221 is depicted as in Figure 15 where the signal (blue) and the annotations (orange) come from the database.The annotations for the normal peaks are modelled into a labelling signal that is 1 when a normal peak occurs and 0 everywhere else.
Our aim is to develop a pattern predictor (i.e., classifier) that identifies normal peaks and mimic the labelling a doctor would manually make for an input ECG.The learning procedure presented in [5] relies on the ϵ-count operator for mining a classifier that minimises the number of mismatches (either false positives, f + , or false negatives, f − ) with respect to the annotations of the samples in the training set.The inferred classifier consists of a PSTL formula and the solution of the parameter space.Obtaining a classifier that mimics a doctor labelling is hence translated into an optimisation problem that minimises both f + and f − .The problem is solvable by the BDMJ20 method in Sub-section 4.3 that deals with two, maybe opposing, optimisation criteria.
Firstly, let's introduce some formal definitions and notation.A labelled signal (s, λ s ) is a pair of signal s and labelling signal λ s .We use S to denote the given set of labelled signals.We use IPPP to mean Increasing Parametric Pattern Predictor (defined formally in [5]).To put it in simple terms, increasing the parameter values for an IPPP makes the true predictions become more common.We aim at learning parameters p for an IPPP Ψ p so that for every given labelled signal (s, λ s ), the labelling signals Ψ p (s) and λ s should match together as much as possible.We measure two kind of mismatches by measuring "how often" the two following signals are true.The false positive signal ¬λ s ∧ Ψ p (s) indicates when the predictor predicts an occurrence when there is none.The false negative signal λ s ∧ ¬Ψ p (s) indicates when the predictor misses an actual occurrence.
Given bounds f + , f − on the allowed ϵ-count of false positives and false negatives, we are interested in the following three sets: For convenience, we call them respectively the positive, negative and intersection solution sets.In addition, we are interested in a relaxed version of the identification problem for false positive bounding, by tolerating a difference of σ time units in matching the labels.This can be done by replacing λ s with the signal4 F [−σ,σ] λ s in (5).More concretely, the solution set of the corresponding σ-relaxed problem is: Hence, the corresponding relaxed version of the intersection solution set (7) is Note that Dom+(Ψ, S, f + ) and Dom+ σ (Ψ, S, f + ) are downsets and Dom−(Ψ, S, f − ) is an upset because Ψ is increasing (recall definitions of upset and downset from Sub-section 4.3).The sets DomInter(Ψ, S, f + , f − ) and DomInter σ (Ψ, S, f + , f − ) are intersections of an upset and a downset, which we can compute from the intersection of two Pareto fronts using BDMJ20 algorithm.
It is also of great interest to compute the set of couples (f + , f − ), called set of feasible error bounds, for which a solution exists: The set P(Ψ, S) is an upset and its minimal elements form a Pareto front.We compute it via membership-queries for couples (f + , f − ).They are done via non-emptiness checking of DomInter σ (Ψ, S, f + , f − ) which is an easier problem than computing the whole set.Equation 10gives a simple and rough characterisation of a normal ECG peak using min/max operators of extended STL [3].For the sake of readability, the syntax has been slightly simplified by replacing On Note that if one increases p 1 , p 2 or p 3 , the property is easier to achieve.Alternatively, Equation 11 characterises ECG pulses in terms of derivatives and integrals in a simpler and more readable way.
For ECG 221, the predictor in Equation 10 can match the labelling with no false negatives (f − = 0) and only a single false positive (f + = 1): it identifies the blue peak in the right hand side of Figure 15 as a valid pulse.We denote the singleton signal set containing the labelled ECG 221 signal as S 221 .Table 1 shows the execution time that ParetoLib requires for computing the solution set (DomInter σ (Ψ ch1 , S 221 , f + = 1, f − = 0)) of the parametric space (p 1 , p 2 , p 3 ) with different accuracy.The value of V δ represents the percentage of the parametric space that remains unexplored.For instance, the green area in Figure 16 represents the solution set of the parameters when the unexplored area is less than 0.01%.The Cython code is around 3% faster than pure Python: the evaluation of the STL expressions, which is the most time consuming part of the mining method, was already externalised to StlEval in previous versions of ParetoLib.Additionally, ParetoLib 2.0 supports the parallel execution of the BDMJ20 method in multicore CPU's, while ParetoLib 1.0 did not.The experiments are run in a Intel(R) Core(TM) i5-3570K CPU @3.40GHz 16GB, Ubuntu 22.04 and Python 3.10.
Equation 10 is also evaluated for labelling pulses on ECG 100 and ECG 123 with different success.We denote the singleton signal sets containing the labelled ECG 100 and ECG 123 signals as S 100 and S 123 respectively.Table 2 compares the number of mismatches, either false positives or false negatives, between the original labelling and the one produced by Equation 10.For the particular case of ECG 100, our solution provides 33 false positives.According to the Pareto front (corresponding to the upset P(Ψ ch1 , S 100 )) in Figure 17, it is impossible to obtain a better solution set using the current specification.This issue does not come from a wrong definition of peak, but from pulses that are separated further than usual.See the first pulse after 2000 time units in Figure 18.It is called an Atrial Premature Beat which should not be considered a normal peak.Equation 12enriches Equation 10with explicit information about the distance between peaks, which improves the accuracy of our peak detector and reduces the number of errors to 3 false positives and 1 false negative.peak is an alias for Equation 10. 6 Related work Similar tools for offline monitoring of cyber-physical systems are AMT 2.0 [18], S-Taliro [25] and Breach [26].AMT 2.0 is a Java tool while S-Taliro and Breach are MATLAB/Simulink toolboxes.AMT 2.0 analyses input traces with extended Signal Temporal Logic (xSTL), which combines STL and Timed Regular Expressions (TRE).On the other hand, S-Taliro and Breach include an explicit model of the cyber-physical system for simulating traces.S-Taliro is specialised in falsification of temporal logic properties by finding trajectories with minimum robustness.Breach allows the exhaustive inspection of the cyber-physical model by systematically varying configuration parameters.Next, py-stl [27,28] implements a similar approach to our BBMJ19 method for mining parametric STL equations.The authors of py-stl apply it to compute a family of distance metrics for a set of monotonic specifications that are mined from time-series learning [29].Finally, MiniPaSTeL [30] implements the BMNN23 method for mining STL parameters [7].It uses RTAMT [31] as monitoring tool.However, these tools include neither the quantitative operators nor a full support for the specification mining methods presented in this paper.Additionally, ParetoLib 2.0 has an adapter for externalising the run of non-parametric STL queries in AMT 2.0 and is also prepared for connecting with the Matlab environment.Similarly to py-stl, ParetoLib 2.0 supports the comparison of the inferred validity domains for mined PSTL specifications.Our tool can compute the Haussdorf distances among a set of validity domains and return the characteristic point for each of them.
Recently, tools for online monitoring of systems that cannot be statically analysed have appeared.RTAMT [31] supports the online diagnosis with STL and IA-STL, an interface-aware extension for defining interface properties about input/output signals of a cyber-physical systems.Finally, RTLola [32] (previously named StreamLab) and TeSSLa [33] complete the state of the art.These tools are focused on evaluating real-time streams instead of signals.

Conclusion
This paper introduces the new features of ParetoLib 2.0, a Python tool for the evaluation and parameter synthesis of Signal Temporal Logic specifications (STL).The main changes w.r.t. the previous version of the tool consist of 1) a graphical user interface that simplifies the usage to the end users, 2) the support of additional quantitative operators that involve counting of events (ϵ-count), trends (derivatives), or accumulations (integrals), and 3) the implementation of new mining methods.Besides, we have optimised the performance of the library for mining the parameter valuations of parametric STL specifications by completing the multi core support and compiling the kernel modules into C code.The compilation of the internal Python modules into C code is transparent to the end users as the transformation is automatically executed during the installation of the Python library.Finally, we introduce the Haussdorf distance in order to compare the inference of validity domains of PSTL specifications.
As future work, we propose to include more options in the graphical user interface that are already available in ParetoLib as command-line options (e.g., the selection of parallel computations or the STL engine).On the other hand, we want to include other types of interpolation beyond the constant pairwise (e.g.linear).We plan to add new logic operators such as the probability operator Pr ∼λ φ, where ∼∈ {<, ≤, ≥, >} and λ ∈ [0, 1].Finally, we will implement a natural language processor for writing STL specifications in natural language.

Fig. 2 :
Fig. 2: Sine wave x(t), its maximum over the window [t, t + 200], and whether x(t) is a local maximum on the interval [t, t + 200].

Fig. 4 :
Fig. 4: Validity domain for signal w 10 (t) and the formula with integral.

Fig. 5 :
Fig. 5: Approximation of the validity domain for signal w 10 (t) and the formula with integral.
[a,b] M ax x by M ax [a,b] x.Equation 10 is equal to 1 if the maximum of x on [t − p 1 , t + p 1 ] is above −p 3 , and its variation is within the bound p 2 on [t − c, t − p 1 ] and on [t + p 1 , t + c].The parameter domains are p 1 ∈ [0, 70], p 2 ∈ [0, 1] and p 3 ∈ [−1, 0].Here, c = 70 is a constant representing an upper limit on p 1 .

Table 1 :
Execution time (seconds) of the solution set of the parametric space for different scenarios.

Table 2 :
Execution time (seconds) and number of f − /f + of the solution set of the parametric space for different ECGs.