Abstract
Hybrid system falsification is an important quality assurance method for cyberphysical systems with the advantage of scalability and feasibility in practice than exhaustive verification. Falsification, given a desired temporal specification, tries to find an input of violation instead of a proof guarantee. The stateoftheart falsification approaches often employ stochastic hillclimbing optimization that minimizes the degree of satisfaction of the temporal specification, given by its quantitative robust semantics. However, it has been shown that the performance of falsification could be severely affected by the socalled scale problem, related to the different scales of the signals used in the specification (e.g., rpm and speed): in the robustness computation, the contribution of a signal could be masked by another one. In this paper, we propose a novel approach to tackle this problem. We first introduce a new robustness definition, called QBRobustness, which combines classical Boolean satisfaction and quantitative robustness. We prove that QBRobustness can be used to judge the satisfaction of the specification and avoid the scale problem in its computation. QBRobustness is exploited by a falsification approach based on Monte Carlo Tree Search over the structure of the formal specification. First, tree traversal identifies the subformulas for which it is needed to compute the quantitative robustness. Then, on the leaves, numerical hillclimbing optimization is performed, aiming to falsify such subformulas. Our indepth evaluation on multiple benchmarks demonstrates that our approach achieves better falsification results than the stateoftheart falsification approaches guided by the classical quantitative robustness, and it is largely not affected by the scale problem.
This work is supported in part by JSPS KAKENHI Grant No. 20H04168, 19K24348, 19H04086, JSTMirai Program Grant No. JPMJMI20B8, Japan. Lei Ma is also supported by Canada CIFAR AI Program and Natural Sciences and Engineering Research Council of Canada. Paolo Arcaini and Ichiro Hasuo are supported by ERATO HASUO Metamathematics for Systems Design Project (No. JPMJER1603), JST.
You have full access to this open access chapter, Download conference paper PDF
Similar content being viewed by others
Keywords
1 Introduction
CyberPhysical Systems (CPS) are hybrid systems that combine physical systems (with continuous dynamics) and digital controllers (that are inherently discrete). Being often safetycritical, their quality assurance is of great importance and widely investigated by both academia and industry. The continuous dynamics of hybrid systems leads to infinite search spaces, making their verification often extremely difficult.
Falsification has been proposed as a more practically feasible approach that tackles the dual problem of verification: instead of exhaustively proving a property, falsification intends to uncover the existence of its violation with counterexamples. Formally, the problem is defined as follows. Given a model \(\mathcal {M}\) taking an input signal \(\mathbf {u} \) and outputting a signal \(\mathcal {M}(\mathbf {u})\), and a specification \(\varphi \) (a temporal formula), the falsification problem consists in finding a falsifying input, i.e., an input signal \(\mathbf {u} \) such that the corresponding output \(\mathcal {M}(\mathbf {u})\) violates \(\varphi \).
The most pursued and successful approach to the falsification problem consists in turning it into an optimization problem; we call it optimizationbased falsification. This is possible thanks to the quantitative robust semantics of temporal formulas [14, 19]. Robust semantics extends the classical Boolean satisfaction relation \(\mathbf {v}\models \varphi \) in the following way: it assigns a value \([\![\mathbf {w}, \varphi ]\!] \in \mathbb {R}\cup \{\infty ,\infty \}\) (i.e., robustness) that tells not only whether \(\varphi \) is satisfied or violated (by the sign), but also how robustly the formula is satisfied or violated.
Optimizationbased falsification approaches adopt hillclimbing stochastic optimization strategies to generate inputs to decrease robustness, which terminate when they find an input with negative robustness, i.e., a falsifying input that triggers the violation of the specification \(\varphi \). Different optimizationbased falsification algorithms have been proposed (see [26] for a survey), and mature tools (e.g., Breach [13] and STaLiRo [4]) have also been developed.
The scale problem is a recognized issue in optimizationbased falsification [21, 40], which could arise when multiple signals with different scales are present in the specification. Namely, it is due to the computation of robust semantics of Boolean connectives, i.e., the way in which the robustness values of different subformulas are compared and aggregated: such computation is problematic in the presence of signals that take values having different order of magnitudes.
Example 1
As very simple example, let us consider the formula \(\varphi \equiv \Box _{[0,30]}(\varphi _1 \wedge \varphi _2)\), with \(\varphi _1 \equiv gear < 6\) and \(\varphi _2 \equiv speed < 130\). It is apparent that \(\varphi _1\) is always satisfied (in any car model with 5 gears), and it has been added in the specification as redundant check.^{Footnote 1} According to robust semantics, the Boolean connective \(\wedge \) is interpreted by minimum \(\sqcap \), and the “always” operator \(\Box _{[0,30]}\) is interpreted by infimum ; the robustness of an atomic formula \(f(\mathbf {x}) < c\) is given by the margin \(c  f(\mathbf {x})\). Therefore, the robustness of \(\varphi \) under the signal \(( gear , speed )\), where \( gear , speed :[0,30]\rightarrow \mathbb {R}\), is . Note that the robustness of \(\varphi _1\) is always in the order of units, while the robustness of \(\varphi _2\) is, in general, in the order of tens. It is not difficult to see that, if both \(\varphi _1\) and \(\varphi _2\) are satisfied, the robustness of \(\varphi \) will only depend on \(\varphi _1\) (because of the minimum in the robust semantics of the logical connective). In this case, we say that \(\varphi _1\) masks \(\varphi _2\). In such a case, a falsification approach relying on robustness will be misled during the search. Note that, in this particular case, the only way to falsify \(\varphi \) is to falsify \(\varphi _2\), because \(\varphi _1\) is always satisfied; therefore, falsifying this relatively simple formula could be extremely difficult for stateoftheart optimizationbased falsification approaches (as we will show and have confirmed in the experiments).
In this paper, we propose a novel approach to tackle the scale problem in optimizationbased falsification. Our intuition and insights are that we should try to avoid the comparison of robustness values of different subformulas, so that one subformula does not mask the contribution of another one.
To achieve this, we first propose a new way of computing the satisfaction of a formula that combines quantitative robust semantics and Boolean semantics. We name the new semantics as QBRobustness. QBRobustness, for each type of formula \(\varphi \), requires selecting a subformula \(\varphi _k\) among its subformulas \(\{\varphi _1, \ldots , \varphi _K\}\). For \(\varphi _k\), the quantitative robust semantics is computed, while for the other subformulas the Boolean semantics is computed. Therefore, the computation of QBRobustness requires identifying a path \(\varSigma \) along the parse tree of the formula \(\varphi \), where visited subformulas are those for which the quantitative robustness is computed. We prove that QBRobustness, independently of the selected \(\varSigma \), is equivalent (in terms of sign and satisfaction) to the quantitative robust semantics (and also to the Boolean one).
In general QBRobustness is a useful tool for avoiding the scale problem of falsification. By definition, the quantitative robustness of different subformulas is never compared, so removing the main cause of the scale problem. It would then make sense to use it for guiding the optimizationbased falsification process. However, QBRobustness requires to choose a particular sequence \(\varSigma \) of subformulas for which to compute the quantitative robustness. It is relatively easy to show that some of them provide a better guidance than others to the falsification search. Considering the previous example, if \(\varSigma \) contains \(\varphi _1\), we can encounter the problem that the quantitative robustness of \(\varphi _1\) would not provide any guidance (i.e., no big variations in the robustness values would be observed). On the other hand, if \(\varSigma \) contains \(\varphi _2\), the quantitative robustness would have larger variations, providing more effective guidance to the search.
Then, the key problem is how to select the best \(\varSigma \), that enables the hillclimbing optimization used in falsification to be more effective. In general, although it is often difficult to know the best \(\varSigma \) in advance, it is still possible to learn it by observing sampling results using different \(\varSigma \). Based on this intuition, we propose a novel falsification approach that identifies the sequences \(\varSigma \) that is more likely to be efficient, and uses them in the new falsification trials. Our approach could be seen as an instantiation of the classical Monte Carlo Tree Search (MCTS) method [8, 28], which is able to efficiently tackle the explorationexploitation tradeoff. In our context, exploration consists in incrementally constructing the tree that represents all the possible sequences, and exploitation consists in selecting the best \(\varSigma \) and running optimizationbased falsification in which QBRobustness with \(\varSigma \) is used.
Overall, the major Contributions of this paper are summarized as follows:

We propose a novel semantics (QBRobustness) for STL formulas that combines quantitative robustness and Boolean satisfaction. We prove that QBRobustness can be used to show the satisfiability of STL formulas;

We define a falsification approach based on MCTS that exploits QBRobustness to address the scale problem;

We implement the approach in the tool ForeSee, based on which, we performed indepth evaluation, demonstrating the effectiveness and advantage of our approach compared with the state of the art.
Paper Structure. In Sect. 2, we introduce the preliminaries of the optimizationbased falsification. In Sect. 3, we introduce the novel STL semantics QBRobustness, and, in Sect. 4, we describe the MCTSbased falsification approach that uses QBRobustness. In Sect. 5, we describe the experiments and evaluation results. Finally, we discuss most relevant work to ours in Sect. 6, and conclude the paper in Sect. 7.
2 Preliminaries
In this section, we briefly review the falsification framework based on robust semantics of temporal logic [14].
Let \(T\in \mathbb {R}_{+}\) be a positive real. An Mdimensional signal with a time horizon T is a function \(\mathbf {w} :[0,T]\rightarrow \mathbb {R}^{M}\). We treat the system model as a black box, i.e., its behaviors are only observed from inputs and their corresponding outputs. Formally, a system model, with Mdimensional input and Ndimensional output, is a function \(\mathcal {M}\) that takes an input signal \(\mathbf {u} :[0,T]\rightarrow \mathbb {R}^{M}\) and returns a signal \(\mathcal {M}(\mathbf {u}):[0,T]\rightarrow \mathbb {R}^{N}\). Here the common time horizon \(T\in \mathbb {R}_{+}\) is arbitrary.
Definition 1
(STL Syntax). We fix a set \(\mathbf {Var}\) of variables. In Signal Temporal Logic (\(\text {STL}\)), atomic propositions and formulas are defined as follows, respectively: \( \alpha \,::\equiv \, f(x_1, \dots , x_N) > 0 \), and \( \varphi \,::\equiv \, \alpha \mid \bot \mid \lnot \varphi \mid \bigwedge \varphi \mid \bigvee \varphi \mid \Box _{I}\varphi \mid \Diamond _{I}\varphi \mid \varphi \mathbin {\mathcal {U}_{I}} \varphi \) Here f is an Nary function \(f:\mathbb {R}^{N}\rightarrow \mathbb {R}\), \(x_1, \dots , x_N \in \mathbf {Var}\), and I is a closed nonsingular interval in \(\mathbb {R}_{\ge 0}\), i.e., \(I=[a,b]\) or \([a, \infty )\), where \(a,b \in \mathbb {R}\) and \(a<b\). \(\Box , \Diamond \) and \(\mathcal {U}\) are temporal operators, which are usually known as always, eventually and until respectively. The always operator \(\Box \) and eventually operator \(\Diamond \) can also be considered as special cases of the until operator \(\mathcal {U}\), where \(\Diamond _{I}\varphi \equiv \top \mathbin {\mathcal {U}_{I}}\varphi \) and \(\Box _{I}\varphi \equiv \lnot \Diamond _{I}\lnot \varphi \). Other common connectives such as \(\rightarrow ,\top \) are introduced as syntactic sugar: \(\top \equiv \lnot \bot \), \(\varphi _1\rightarrow \varphi _2 \equiv \lnot \varphi _1 \vee \varphi _2\).
Definition 2
(Quantitative Robust Semantics). Let \(\mathbf {w} :[0,T]\rightarrow \mathbb {R}^{N}\) be an Ndimensional signal, and \(t\in [0,T)\). The tshift \(\mathbf {w} ^t\) of \(\mathbf {w} \) is the signal \(\mathbf {w} ^t:[0,Tt]\rightarrow \mathbb {R}^{N}\) defined by \(\mathbf {w} ^t(t') :=\mathbf {w} (t+t')\). Let \(\varphi \) be an \(\text {STL}\) formula. We define the robustness \({ [\![\mathbf {w}, \varphi ]\!]} \in \mathbb {R}\cup \{\infty ,\infty \}\) as follows, by induction on the construction of formulas. and denote infimums and supremums of real numbers, respectively. \(\sqcap \), the binary version of , denotes minimum.
The original \(\text {STL}\) semantics is Boolean, given by a binary relation \(\models \) between signals and formulas. The robust semantics refines the Boolean one in the following sense: \( [\![\mathbf {w},\varphi ]\!] > 0\) implies \(\mathbf {w} \models \varphi \), and \([\![\mathbf {w},\varphi ]\!] < 0\) implies \(\mathbf {w} \not \models \varphi \), see [19, Prop. 16].
2.1 Hill ClimbingGuided Falsification
So far, the falsification problem has received extensive industrial and academic attention. One possible approach direction by hillclimbing optimization is an established field, too: see [2,3,4, 10, 13,14,15, 17, 26, 29, 36,37,39, 42] and the tools Breach [13] and STaLiRo [4]. We formulate the problem and the methodology, for later use in describing our falsification approach.
Definition 3
(Falsifying Input). Let \(\mathcal {M}\) be a system model, and \(\varphi \) be an STL formula. A signal \(\mathbf {u} :[0,T]\rightarrow \mathbb {R}^{\mathbf {Var}}\) is a falsifying input if \({ [\![\mathcal {M}(\mathbf {u}), \varphi ]\!]}<0\); the latter implies \(\mathcal {M}(\mathbf {u})\not \models \varphi \).
The use of quantitative robust semantics \({ [\![\mathcal {M}(\mathbf {u}), \varphi ]\!]}\in \mathbb {R}\cup \{\infty ,\infty \}\) in the above problem enables the use of hillclimbing optimization.
Definition 4
(Hill ClimbingGuided Falsification). Assume the setting in Definition 3, for finding a falsifying input, the methodology of hill climbingguided falsification is presented in Algorithm 1. Here, the function \(\textsc {HillClimb}\) makes a guess of an input signal \(\mathbf {u} '\), aiming at minimizing the robustness \({ [\![\mathcal {M}(\mathbf {u} '), \varphi ]\!]}\). It does so, learning from the sampling history H that contains the previous observations of input signals and their corresponding robustness values.
The \(\textsc {HillClimb}\) function can be designed based on various stochastic optimization algorithms. Typically, at the early phase of the optimization, the proposal of new input is usually based on random sampling; as the set of sampling history grows larger, the algorithm takes various metaheuristicbased strategies to achieve the optimization goal efficiently. Examples of such algorithms include Covariance Matrix Adaption Evolution Strategy (CMAES) [7] (used in our experiments), Simulated Annealing, Global Nelder Mead [32], etc.
3 QBRobustness
The scale problem is a known important issue that negatively affects the performance of falsification, which arises when connective operators (i.e., conjunction and disjunction) with operands that predicate on different signals appear in the STL formula under falsification. According to the classic quantitative robust semantics (see Definition 2), the robustness of those formulas is calculated based on the comparison (minimum for conjunction, and maximum for disjunction) between robustness values coming from the different operand subformulas. However, since different signals may differ in magnitude, the comparison may be biased, such that one signal \(\mathbf {w} \) may always (or often) mask the contribution of the others, and, therefore, the final robustness may be dominated by this signal \(\mathbf {w} \). Note that, although the scale problem affects connective operators, it is not only local to the place of their application, but it is always propagated to the robustness of the whole formula. The scale problem has been shown as a root cause of the failure of many falsification problems [21, 40].
In this work, we propose a novel approach for solving the scale problem in falsification. Our approach consists in introducing a new semantics for STL that does not suffer from the scale problem. Such new semantics will be used in a falsification approach based on Monte Carlo Tree Search. We describe details of the new semantics in this section, and the new falsification approach in Sect. 4.
The new proposed semantics, called QBRobustness, combines quantitative robustness and Boolean satisfaction. By construction, it never compares quantitative robustness values that come from different subformulas, thus avoiding the scale problem. QBRobustness is defined for the whole STL formulas, except for the “until” operator \(\varphi _1 \mathbin {\mathcal {U}_{I}} \varphi _2\), when \(\varphi _1\) is an arbitrary formula. We still support it as “eventually” and “always” operators^{Footnote 2}, i.e., when \(\varphi _1 = \top \). Note that this is not a major limitation, as QBRobustness still supports the majority of specifications that are used in industry: indeed, in the experiments, we were able to handle all the specifications used in falsification competitions [18], which collect benchmarks from industrial case studies.
To better explain the computation of QBRobustness, we introduce some definitions. Let us first define the notion of immediate subformula for STL.
Definition 5
(Immediate SubFormulas). Let \(\varphi \) be an STL formula (see Definition 1). We define the set \(\mathsf {ISForm} (\varphi )\) of immediate subformulas of \(\varphi \) as follows:
Intuitively, the immediate subformula set of a connective (conjunction or disjunction) contains all its operands. For the other unary operators (temporal operators, negation, etc.), its immediate subformula set is given by the immediate subformula set of its argument.
The computation of QBRobustness requires to select some nested immediate subformulas. To this aim, we introduce the notion of subformula sequence.
Definition 6
(SubFormula Sequence). Let \(\varphi \) be an STL formula. A subformula sequence \(\varSigma = \sigma _1\cdot \dotsc \cdot \sigma _L\) w.r.t. \(\varphi \) is defined as follows:
where the \(\cdot \) is the concatenation operator in the sequence. We use \(\varSigma _k\) to denote the kth element of \(\varSigma \). Moreover, we denote the first element by \(\varSigma _{\mathsf {head}}\), and the last element by \(\varSigma _{\mathsf {rear}}\). We use \(\varSigma _{\overline{\mathsf {head}}}\) to denote \(\varSigma \) without \(\varSigma _{\mathsf {head}}\). We identify with \(\varepsilon \) the empty sequence; when \(\mathsf {ISForm} (\varphi ) = \varnothing \), we use \(\varepsilon \) as its subformula sequence. We identify with \(\boldsymbol{\varSigma }_{\varphi }\) the set of all the subformula sequences rooted in \(\varphi \).
To be specific, in a subformula sequence \(\varSigma \), each element is one of the subformulas of the previous element. This means that, for Boolean connectives, only one of the operands is selected. Moreover, an atomic subformula predicating over a single signal can only appear as the final element of a sequence. We exploit these characteristics of \(\varSigma \) to define QBRobustness, which combines the quantitative robustness of the subformulas related to a given signal with the Boolean satisfaction of the other subformulas. QBRobustness, given a sequence \(\varSigma \), decides whether to compute the quantitative robust semantics or the Boolean semantics of a subformula, by considering whether the subformula belongs to \(\varSigma \) or not. This implies that, in the case of conjunction and disjunction, we evaluate the quantitative robustness of the subformula in \(\varSigma \) and the Boolean satisfaction of the other subformulas. Based on such intuition, we define the semantics of our proposed QBRobustness in Definition 7, and demonstrate its usefulness in Theorem 1.
Definition 7
(Semantics of QBRobustness). Let \(\varphi \) be an STL formula as defined in Definition 1, and \(\varSigma \) be a subformula sequence w.r.t. \(\varphi \). For \(\varphi \equiv \bigwedge \varphi _i\mid \bigvee \varphi _i\), let \(\varphi _k\in \mathsf {ISForm} (\varphi )\) be the first element \(\varSigma _{\mathsf {head}}\) of \(\varSigma \), then we can represent these two cases as \(\varphi \equiv \varphi _k\wedge \varphi _{\overline{k}}\mid \varphi _k\vee \varphi _{\overline{k}}\), where \(\varphi _{\overline{k}}\) is the conjunction (or disjunction, respectively) of the other formulas in \(\mathsf {ISForm} (\varphi )\setminus \{\varphi _k\}\). The QBRobustness \(\mathsf {QBRob}(\mathbf {w}, \varphi , \varSigma )\) of \(\varphi \) w.r.t. \(\varSigma \) is defined as follows:
We now prove that the semantics of QBRobustness is equivalent (in the sense of satisfaction) to the Boolean semantics, and so it can be used to show violation of a specification in a falsification algorithm, as we do in this paper.
Theorem 1
Let \(\varphi \) be an STL formula. Given a signal \(\mathbf {w}\), for any \(\varSigma \in \boldsymbol{\varSigma }_{\varphi } \), it holds that \(\mathsf {QBRob}(\mathbf {w}, \varphi , \varSigma ) > 0\) implies \(\mathbf {w} \models \varphi \). Similarly, for any \(\varSigma \in \boldsymbol{\varSigma }_{\varphi } \), it holds that \(\mathsf {QBRob}(\mathbf {w}, \varphi , \varSigma ) < 0\) implies \(\mathbf {w} \not \models \varphi \).
Proof
We first recall from [19, Prop. 16] that \({ [\![\mathbf {w}, \varphi ]\!]} < 0\) implies \(\mathbf {w} \not \models \varphi \), and that \( [\![\mathbf {w},\varphi ]\!] > 0\) implies \(\mathbf {w} \models \varphi \). We prove Theorem 1 by induction on the structure of the formula.

Case \(\varphi = \alpha \). By Definition 7, \(\mathsf {QBRob}(\mathbf {w}, \alpha , \varepsilon ) > 0\) indicates that \({ [\![\mathbf {w}, \alpha ]\!]} > 0\) and hence \(\mathbf {w} \models \alpha \), and \(\mathsf {QBRob}(\mathbf {w}, \alpha , \varepsilon ) < 0\) that \({ [\![\mathbf {w}, \alpha ]\!]} < 0\) and hence \(\mathbf {w} \not \models \varphi \).

For the following cases, let us assume that Theorem 1 holds for an arbitrary formula \(\varphi '\) and its subformula sequence \(\varSigma '\) that \(\mathsf {QBRob}(\mathbf {w}, \varphi ', \varSigma ') > 0\) implies \({ [\![\mathbf {w}, \varphi ' ]\!]} > 0\), and that \(\mathsf {QBRob}(\mathbf {w}, \varphi ', \varSigma ') < 0\) implies \({ [\![\mathbf {w}, \varphi ' ]\!]} < 0\). We aim to prove that Theorem 1 also holds for \(\varphi \), resulting from the application of the operator in each of the following cases to \(\varphi '\), and \(\varSigma \), the subformula sequence of \(\varphi \).

Case , where \(\psi \) is an arbitrary formula. Let \(\varSigma = \varphi '\cdot \varSigma '\), and let us consider the two cases in which \(\mathsf {QBRob}(\mathbf {w}, \varphi , \varSigma )\) is negative and positive separately:

\(*\) If \(\mathsf {QBRob}(\mathbf {w}, \varphi , \varSigma ) < 0\), there are two subcases:

\(\cdot \) if \(\mathsf {QBRob}(\mathbf {w}, \varphi ', \varSigma ') < 0\), then \({ [\![\mathbf {w}, \varphi ' ]\!]} < 0\) (by assumption). Then, by the robust semantics of conjunction, also \({ [\![\mathbf {w}, \varphi ]\!]} < 0\) holds, and so it does \(\mathbf {w} \not \models \varphi \).

\(\cdot \) if \(\mathsf {QBRob}(\mathbf {w}, \varphi ', \varSigma ') > 0\), then \({ [\![\mathbf {w}, \varphi ' ]\!]} > 0\) (by assumption). Then, it holds \(\mathbf {w} \not \models \psi \) by Definition 7, and, therefore, it holds \(\mathbf {w} \not \models \varphi \).


\(*\) If \(\mathsf {QBRob}(\mathbf {w}, \varphi , \varSigma ) > 0\), it means that \(\mathsf {QBRob}(\mathbf {w}, \varphi ', \varSigma ') > 0\) and \(\mathbf {w} \models \psi \) (by Definition 7). By assumption, if \(\mathsf {QBRob}(\mathbf {w}, \varphi ', \varSigma ') > 0\), then \({ [\![\mathbf {w}, \varphi ' ]\!]} > 0\). Therefore, \(\mathbf {w} \models \varphi \).


Case \(\varphi = \Box _{I}\varphi '\). Let \(\varSigma = \varSigma '\), and let us consider the two cases in which \(\mathsf {QBRob}(\mathbf {w}, \varphi , \varSigma )\) is negative and positive separately:

\(*\) By Definition 7\(, \mathsf {QBRob}(\mathbf {w}, \varphi , \varSigma )< 0\) indicates that there exists a \(t\in I\) such that \(\mathsf {QBRob}(\mathbf {w} ^{t}, \varphi ', \varSigma ) < 0\). By assumption, it holds that \(\mathbf {w} ^{t}\not \models \varphi '\). Then, by the semantics of the always operator \(\Box \), it holds that \(\mathbf {w} \not \models \varphi \).

\(*\) By Definition 7\(, \mathsf {QBRob}(\mathbf {w}, \varphi , \varSigma ) > 0\) indicates that for all \(t\in I\) it holds that \(\mathsf {QBRob}(\mathbf {w} ^{t}, \varphi ', \varSigma ) > 0\). Then, by assumption, it holds that for all \(t\in I\), \(\mathbf {w} ^t\models \varphi '\). So, by the semantics of the always operator \(\Box \), it holds that \(\mathbf {w} \models \varphi \).


Case \(\varphi = \lnot \varphi '\). Let \(\varSigma = \varSigma '\), and let us consider the two cases in which \(\mathsf {QBRob}(\mathbf {w}, \varphi , \varSigma )\) is negative and positive separately:

\(*\) By Definition 7, \(\mathsf {QBRob}(\mathbf {w}, \varphi , \varSigma ) < 0\) indicates that \(\mathsf {QBRob}(\mathbf {w}, \varphi ', \varSigma ') > 0\). By assumption, it holds that \(\mathbf {w} \models \varphi '\), and therefore, \(\mathbf {w} \not \models \varphi \).

\(*\) By Definition 7, \(\mathsf {QBRob}(\mathbf {w}, \varphi , \varSigma ) > 0\) indicates that \(\mathsf {QBRob}(\mathbf {w}, \varphi ', \varSigma ') < 0\). By assumption, it holds that \(\mathbf {w} \not \models \varphi '\), and, therefore, \(\mathbf {w} \models \varphi \).


Proofs for the cases of \(\varphi = \varphi '\vee \psi \) and \(\varphi = \Diamond _{I}\varphi '\) follow similar proof patterns, and so are left to the readers.

We use an example to illustrate how QBRobustness is used for checking the satisfiability of an STL formula.
Example 2
Let \(\mathbf {w} :[0,T] \rightarrow \mathbb {R}^2\) be a 2dimensional signal and \(\varphi = \Box _{I}(\varphi _1\vee \varphi _2)\) be an STL formula where \(\varphi _1\) and \(\varphi _2\) are two atomic formulas. Intuitively, to make \(\varphi \) falsified, there must exist \(t\in I\) such that \(\mathbf {w} ^t\not \models \varphi _1\) and \(\mathbf {w} ^t\not \models \varphi _2\). Let us consider a nontrivial falsification problem in which, for most of the signals \(\mathbf {w} \), sets \(\{t\in I \mid \mathbf {w} ^t\not \models \varphi _1\}\) and \(\{t\in I \mid \mathbf {w} ^t\not \models \varphi _2\}\) are nonempty and disjoint.
By Definition 7, given the subformula sequence \(\varSigma = \varphi _1\) of \(\varphi \), the corresponding QBRobustness is , i.e., it takes the infimum of \(\mathsf {QBRob}(\mathbf {w} ^t, \varphi _1\vee \varphi _2, \varphi _1)\) over \(t\in I\). Again, by Definition 7, for any \(t'\in I\), \(\mathsf {QBRob}(\mathbf {w} ^{t'}, \varphi _1\vee \varphi _2, \varphi _1)\) is computed as follows:

if for a \(t' \in I\) it holds \(\mathbf {w} ^{t'}\models \varphi _2\), then \(\mathsf {QBRob}(\mathbf {w} ^{t'}, \varphi _1\vee \varphi _2, \varphi _1) = \infty \). Then, it is impossible that is given by \(\mathsf {QBRob}(\mathbf {w} ^{t'}, \varphi _1\vee \varphi _2, \varphi _1)\);

if for a \(t' \in I\) it holds \(\mathbf {w} ^{t'}\not \models \varphi _2\), then \(\mathsf {QBRob}(\mathbf {w} ^{t'}, \varphi _1\vee \varphi _2, \varphi _1) = \mathsf {QBRob}(\mathbf {w} ^{t'}, \varphi _1, \varepsilon ) = { [\![\mathbf {w} ^{t'}, \varphi _1 ]\!]}\). In this case, \(\mathsf {QBRob}(\mathbf {w} ^{t'}, \varphi _1\vee \varphi _2, \varphi _1)\) has a chance to determine the value of .
Therefore, when \(\varSigma = \varphi _1\), it holds that , where \(S = \{t\in I\mid \mathbf {w} ^t\not \models \varphi _2\}\), i.e., the infimum of the quantitative robustness of \(\varphi _1\) on the interval when \(\varphi _2\) is violated. Indeed, once this value is negative, it means that there exists a point \(t\in I\) when both \(\varphi _1\) and \(\varphi _2\) are violated; by the Boolean semantics of always and disjunction, \(\varphi \) is violated.
4 MCTSBased Falsification Guided by QBRobustness
QBRobustness never compares robustness values coming from signals with different magnitudes, and, therefore, it does not suffer from the scale problem. As such, it could be used in falsification approaches instead of the classical pure quantitative robustness.
However, a subformula sequence \(\varSigma \) is required when calculating QBRobustness, and such sequence is not unique (see Definition 7). Note that the selection of the sequence can affect the performance of the numerical optimization algorithms used in falsification. Let us consider \(\varphi \equiv \Box (( gear< 6) \wedge ( speed < 130))\) as an example. As explained in Sect.1, numerical optimization will perform better if guided by the robustness values coming from \( speed \) rather than by those coming from \( gear \). Therefore, in a falsification approach using QBRobustness, it is important to select an appropriate subformula sequence \(\varSigma \).
By using the QBRobustness, the problem of falsifying an STL formula \(\varphi \) consists in finding both a signal \(\mathbf {w} \) and a subformula sequence \(\varSigma \) such that \(\mathsf {QBRob}(\mathbf {w}, \varphi , \varSigma ) < 0\). The selection of \(\varSigma \) is discrete, while the search for \(\mathbf {w} \) is numerical. In order to combine these processes that are different in nature, we propose to adapt Monte Carlo Tree Search (MCTS) [8, 28]. In the following, we firstly give a brief introduction to MCTS in Subsect. 4.1, and then present the application of MCTS to our falsification problem in Subsect. 4.2.
4.1 MCTS Background
MCTS exemplifies the “trial and error” philosophy, and has achieved a great success over the past decade, most notably in fields such as the computer Go game [35]. MCTS explores the action space given by the possible actions of the system; for example, in the Go game, these are the positions where to put the next stone. The approach builds a tree of sequences of actions, and assigns rewards to the different branches. MCTS performs the search by iteratively taking the following four steps. See Fig. 1, where the general scheme is adapted to our current setting, for illustration.

Selection. It selects a node to expand or to reason about. Initially, selection has no other choice than the root. When a node has multiple expanded children, selection will be done according to the UCB1 [6] algorithm.

Expansion. Child expansion happens after selection if the selected node has unexpanded children. A child will be added to the tree during expansion.

Playout. After a node is just expanded or a leaf is reached, playout is performed for evaluating the node. The evaluation is given by a reward, which is a real number in [0, 1]. Reward can be interpreted differently in different contexts. For example, in the Go game, the reward of a position is measured by the winning rate when a stone is positioned there; this is estimated by randomly playing the game until the end for n times, and then taking the ratio \(\frac{n_w}{n}\) of the number of winning as the winning rate.

Backpropagation. Backpropagation updates the number of visits and the reward of the nodes along the path from the node of playout to the root. These data are used in subsequent loops to decide the branches to explore.
At the end, the action space will be sufficiently explored in an unbalanced manner, by focusing on the most promising subspaces giving the highest rewards.
4.2 Proposed QBRobustnessGuided Falsification Approach
We here propose a falsification framework based on MCTS in which, during tree construction, we synthesize and select a subformula sequence that facilitates the falsification progress the most, and, at the bottom layer of the tree, we run numerical optimization to search for a falsifying input and provide feedback (i.e., backpropagation) to guide the sequence selection.
We formalize our algorithm in Algorithm 2 and visualize its execution in Fig. 1. In the following, we elaborate on our approach.
We construct the tree in this way: each node represents a sequence of formulas, and each edge of a node is a subformula of the last element of the sequence represented by the node. The root is initialized with a sequence holding \(\varphi \) only (Lines 2–3) and some other properties including the number of visits to the different nodes (Line 4), the reward (Line 5), and the history of hillclimbing sampling (Line 6). The main process of MCTS consists in calling the MCTSSearch function iteratively with the root as argument (Line 8), until the exhaustion of the MCTS budget or a falsifying input is found (Line 7). The MCTSSearch function (Line 9) goes through the four phases, namely selection, expansion, playout and backpropagation, of the original MCTS algorithm.
Selection. Selection happens when a node has children (Line 10) and these have all been expanded (Line 11). It selects a child according to the UCB1 [6] algorithm (Line 12) to take a balance between exploration and exploitation. The exploitation is embodied by the reward \(R(\varSigma \cdot \varphi _i)\)—the higher the reward is, the more likely a falsifying input is found following that branch. Exploration, instead, is considered via \(\sqrt{\frac{2\ln {N(\varSigma )}}{N(\varSigma \cdot \varphi _i)}}\) that is negatively correlated to the number of visits to a child—the more the child was visited before, the less chance it will be visited again. The scalar c is a tunable parameter that balances the tradeoff between exploration and exploitation. After a child \(\varSigma \cdot \varphi _k\) is selected, it will be taken as the argument of the next MCTSSearch loop (Line 18).
Expansion. If not all the children of a node have been expanded (Line 13), a child will be expanded. Expansion consists in randomly selecting a child from the unexpanded child list (Line 14), adding it to the tree (Line 15), initilizing properties including the number of visits and history (Lines 16–17). After expansion, the newly expanded child will be taken as the argument of the recursive call to MCTSSearch (Line 18).
Playout. If a leaf node that has no children to expand is reached, the playout phase will start to devise a reward for evaluating the visited path. In our context, we define the reward based on the best robustness value that can be obtained with the path; specifically, playout consists in running hillclimbing guided falsification to search for a minimal robustness value (Line 22). Note that the sequence \(\varSigma \) represented by a leaf node is actually the concatenation between \(\varphi \) and a subformula sequence of \(\varphi \). We extract the suffix of \(\varSigma \), i.e., the subformula sequence, to compute the QBRobustness as a guidance to the hillclimbing optimization (Line 23). If a negative QBRobustness is found (Line 24), then the whole algorithm can be terminated and the input signal \(\mathbf {u} \) that triggers the negative QBRobustness can be returned as the falsifying input (Line 25); otherwise, the sampling history of hill climbing will be saved (Line 26) so that the future playout at the same leaf can be restored from that context. After playout, the reward of the leaf node will be updated based on the definition of the reward, which will be introduced below. Reward Since our goal is to find a sequence \(\varSigma \) with which hillclimbing optimization can minimize \(\mathsf {QBRob}(\mathbf {w}, \varphi , \varSigma )\) smoothly, we connect the reward with the hillclimbing progress. Formally, given a sampling history H, our reward (Line 27) is defined as \( \mathsf {Rwd}(\mathsf {rb} ', H)\;:=\; \frac{\max _{}{\mathsf {rb} _h}  \min _{}{(\{\mathsf {rb} '\}\cup {\mathsf {rb} _{h}})}}{\max _{}{\mathsf {rb} _h}} \), where \(\mathsf {rb} _h\) is the history of robustness values in H.
Backpropagation. In MCTS, the playout result of a leaf is backpropagated to the higher layer nodes along the path, so that the future selection on the high layer is referred. Backpropagation updates two properties of each ancestor of the leaf till the root, the reward (Line 19) and the number of visits (Line 28).
Remark 1
(Approach Complexity). With respect to classical falsification, our approach introduces an exploration phase for searching the “best” subformula sequence to instantiate QBRobustness. The number of these sequences corresponds to the number of atomic subformulas (and so the leaves of the tree). Considering that most of the time is spent on playout, the complexity of our approach grows linearly with the number of atomic subformulas.
5 Experimental Evaluation
In this section, we present the experiments we conducted to evaluate the effectiveness of the proposed approach. We first introduce the experiment setup in Subsect. 5.1, and then we present the experimental evaluation results by answering three research questions in Subsect. 5.2.
5.1 Experiment Setup
Simulink Models and Specifications As our benchmarks, we selected three Simulink models frequently used in the falsification community (i.e., in the falsification competitions [18]), and 30 specifications defined for them. All these models are complicated hybrid systems with multiple input and output signals. The specifications are STL formulas that formalize system requirements regarding safety, performance, etc. Since we are interested in assessing the influence of the scale problem to the performance of the compared falsification approaches, all the considered specifications predicate over, at least, two signals. Table 1 reports the 30 specifications under test. The IDs of the specifications identify which models they belong to. A description of the three models and of their specifications is as follows.

Automatic Transmission (AT) [24] has two input signals, \( throttle \in [0,100]\) and \( brake \in [0,325]\), and three outputs signals including gear, \( speed \) and \( rpm \). Most of the specifications we used formalize safety requirements of the system. For instance, AT2 requires that when gear is as high as 4, \( rpm \) should not be larger than 4300; AT3 is an adaptation of the example we used in Sect. 1; AT1012 reason about the relationship among the three output signals; AT17 specifies three properties for three different time intervals; AT18 specifies different properties for different values of \( gear \); AT14, AT21 and AT22 impose logical constraints on input signals, in addition to the property under consideration.

Abstract Fuel Control (AFC) [25] takes two input signals, \( PedalAngle \in [8.8, 70]\) and \( EngineSpeed \in [900, 1100]\), and outputs a ratio \(\mu \) reflecting the deviation of airfuelratio from its reference value. The basic safety requirement to this system is that \(\mu \) should not be deviated from the reference value too much (AFC1); in addition to that, our specifications also reason about the resilience of the system (AFC5 and AFC6), and impose input constraints (AFC26).

Free Floating Robot (FFR) [11] models robot moving in a 2dimentional space. It has four input signals \(u_1, u_2, u_3, u_4\in [10, 10]\) that are boosters for a robot, and four output signals that are the position in terms of coordinate values x, y and their oneorder derivatives \(\dot{x}, \dot{y}\). The specifications regulate the kinetic properties of the robot: FFR1 requires the robot to pass an area around the point (4, 4) under an input constraint, and FFR2 requires the robot to stay in an area for at least 2 s.
Baseline Approach and Our Proposed Approach. In our experiments, we compare the performances of our proposed approach with the baseline Breach approach. We implemented our approach in the tool ForeSee, which stands for FORmula Exploitation by Sequence trEE for falsification.
Breach is a stateoftheart falsification tool that implements the classic falsification workflow we introduced in Sect. 2. The quantitative robustness calculation in Breach is based on the robust semantics given in Definition 2. Breach also encapsulates several stochastic optimization algorithms, such as CMAES, Simulated Annealing, etc. The implementation of our ForeSee approach uses Breach only for interfacing with the Simulink model and for the calculation of quantitative robustness; instead, the calculation of QBRobustness, and the implementation of the MCTS algorithm are novel. Since CMAES has proved to be the stateoftheart stochastic algorithm [39], we select CMAES as our backend optimizer for the playout phase.^{Footnote 3}
We apply the two approaches, ForeSee and Breach, to each benchmark specification reported in Table 1. Since both approaches are based on stochastic optimization, we repeat each experiment for 30 times, as suggested by a guideline for conducting experiments with randomized algorithms [5]. For each experiment, both approaches have been given a total timeout \(B_M\) of 900 s (see Algorithm 2).
Evaluation Metrics. As first evaluation metric, we compute the falsification rate (FR) as the number of runs (out of 30) in which the approach returns a falsifying input. Therefore, FR is an indicator of the effectiveness of an approach, i.e., it reflects the ability of an algorithm to falsify the specification. As second evaluation metric, we compute the average time (seconds), as average execution time of the successful falsification runs. Therefore, the average time is an indicator of the efficiency of the approach. We do not report the number of simulations because these are consistent with the execution time.
Experiment Platform. In our experiments, we use Breach [13] (ver 1.2.13) with CMAES (the state of the art). Breach accepts piecewise constant signals as input for the Simulink models; we use the same settings used in falsification competitions [18]: we use piecewise constant signals with five control points for AT and AFC, and with four control points for FFR. As configuration of MCTS (see Algorithm 2), we set the UCB1 scalar c to 0.2, and the playout budget \(B_P\) to 10 generations. The experiments have been executed on an Amazon EC2 c4.2xlarge instance (2.9 GHz Intel Xeon E52666 v3, 15 GB RAM).
5.2 Evaluation
We here analyze the experimental results using three research questions (RQs).
RQ1. Does the proposed approach perform better than stateoftheart falsification approaches?
In this RQ, we aim at assessing whether the proposed approach is indeed able to tackle the scale problem in falsification and performs better than stateoftheart approaches. Table 2 reports, for each specification benchmark, the falsification rate FR and the average execution time of our proposed approach ForeSee and of the baseline Breach. The table further reports the difference of the two metrics between the two approaches. We highlight in gray the best results in which ForeSee has an FR of 15 units higher than Breach. We observe that for 25 benchmarks out of 30, ForeSee has a better FR, and in 15 of these the improvement is significant (selected in gray). Note that there are notable cases, such as AT3, AT13, AT16, and AT17, in which Breach only finds at most two falsifying inputs, while ForeSee finds always at least 29 falsifying inputs. In four cases, Breach has a better FR: while for AT8, AFC6, and FFR2 the difference is minimal, it is quite large for AT14. We further inspected such specification and its corresponding model (see Table 1); we noticed that all the subformulas in AT14 must be falsified to falsify the whole specification^{Footnote 4}, and they are all difficult to be falsified. In such a case, there is no best subformula sequence \(\varSigma \): therefore, the time spent by ForeSee in exploring different \(\varSigma \) does not provide any improvement.
Regarding the time execution, there is no clear trend among the different results: sometimes ForeSee is faster, other times Breach is. However, even in the cases in which ForeSee is slower, it is still below the timeout by which it manages to find a falsifying input (so, leading to better falsification rates).
RQ2. Does the proposed approach solve the scale problem effectively?
The benchmarks reported in Table 1 and experimented in RQ1, predicate over signals having different scales and so they suffer from the scale problem. RQ1 showed that ForeSee is very efficient in falsifying them. In this RQ, we want to make a more systematic study of the effects of the scale problem; indeed, the scale problem could manifest itself in different ways, depending on the difference of the order of magnitudes of the different signals (e.g., speed [km/h] vs. rpm, or speed [km/h] vs. rph). To assess this, we take six specifications from Table 1 and we artificially modify their outputs: namely, we multiply by \(10^k\) (with different k values depending on the specification) the \( speed \) of AT1, AT3, AT4, and AT9; the \( rpm \) of AT15; and the \( EngineSpeed \) of AFC3. For each artificial rescaling, both the Simulink model and the specification have been changed.^{Footnote 5} We run ForeSee and Breach on these rescaled benchmarks. Table 3 reports the experimental results for each k. The table also reports the minimum, maximum, and mean results for FR and execution time. We observe that the performance of Breach, in terms of FR, is very sensitive to the scale problem. Indeed, for all the specifications, FR decreases with increasing or decreasing k; notable examples are AT3 and AT4 in which Breach can (almost) always falsify with the minimum k, but never falsifies with the maximum two k. This is the demonstration of the effects of the scale problem on falsification approaches that only rely on quantitative robust semantics where the robustness values of different signals are compared. By looking at the results of ForeSee, instead, we observe that it is much more robust and its FR performance is independent of the applied rescaling. This clearly shows that our falsification approach guided by QBRobustness is successful in avoiding the scale problem.
These results also allow us to show that the naive approach based on normalization for solving the scale problem does not work, as also reported in [41]. Indeed, one may think that a solution for tackling the scale problem could be to rescale the signals in a way to make them have the same order of magnitude. This is not a good approach. Let us consider the results in Table 3c for AT4 (\(\Box _{[0,30]}~( speed<135 \wedge rpm <4780)\)). In this case, \( speed \) is multiplied by \(10^k\). We may think that the best falsification result should occur when \( speed \) is multiplied by \(10^2\), because this would make the two signals both in the order of thousands. However, this rescaling is the one giving the worst result. The best result is actually given by the rescaling making \( speed \) even smaller (i.e., \(k=2\) and \(k=1\)). This means that the correct way for handling the scale problem cannot be identified in advance, but we need an approach as ours that learns during falsification the best strategy.
RQ3. How do the hyperparameters of MCTS influence the performance of the proposed approach?
Our proposed approach is an instantiation of the Monte Carlo Tree Search (MCTS) method [8, 28] that can be configured with some hyperparameters, namely the scalar c used by UCB1 (Line 12 in Algorithm 2), and the playout budget \(B_P\) (Line 21 in Algorithm 2), both used for balancing between exploration and exploitation. Therefore, the performance of MCTS could be affected by the values used for these hyperparameters. In this RQ, we try to assess this. We selected three benchmarks specifications (AT17, AT19, and AT21) and varied one hyperparameter while keeping the other fixed. Namely, we experimented with \(c \in \{0, 0.02, 0.2, 0.5, 1\}\) and budget \(B_P=10\) (see Table 4a), and with \(B_P \in \{2, 5, 10, 15, 20\}\) and budget \(c=0.2\) (see Table 4b). Looking at the results of Table 4a for AT17 and AT21, there seems to be some influence by the scalar c. In AT17, the worst result in terms of FR is obtained when c is 0, meaning that MCTS only focuses on exploitation. AT17 is a specification that suffers from the scale problem, as shown by the very bad performance of Breach in Table 2; for such a specification, we need to perform some exploration to find the best \(\varSigma \): this explain the low performance of MCTS with \(c=0\). On the other hand, the worst FR performance of AT21 is given by the highest value \(c=1\) that requires MCTS to spend a lot of time in exploration. Since AT21 is not an extremely difficult specification (indeed Breach has FR of 10 in Table 2), such very conservative approach does not pay off, while more greedy approaches (i.e., with lower c) have better performance.
Looking at the results of Table 4b related to \(B_P\), it seems that there is no too much influence. The only difference is given in AT17 with \(B_P=2\) where the FR is slightly lower than the other cases. This means that, provided that a sufficiently large value for \(B_P\) is given, ForeSee is not too sensitive to it.
6 Related Work
Quality assurance of CPS has been actively studied, due to its great significance. Different approaches, including but not limited to model checking, theorem proving, rigorous numerics, and nonstandard analysis [9, 16, 20, 22, 23, 31, 33], have been proposed to solve the problem. However, due to the scalability issue and existence of blackbox components, those approaches are not widely applied in the realworld systems.
The optimizationbased falsification approach inherits the searchbased testing methodology, and is much more scalable than pure verificationbased approaches. The key issue of searchbased testing is the explorationexploitation tradeoff. This issue has been discussed for the verification of quantitative properties (e.g., [34]). In the falsification community, there have also been a lot of works focusing on that, and these works tackle the problem from different perspectives. Metaheuristics refers to highlevel heuristic strategies that utilize heuristics to improve the search efficiency. Several metaheuristic strategies have been applied in falsification, such as Simulated Annealing [1], tabu search [10], and so on. Coverageguided falsification [2, 10, 15, 29] aims to guide the search using some coverage metrics, so that the search space is sufficiently explored. Recently, machine learning techniques have also been applied to falsification to enhance the search ability. For instance, Bayesian optimization [3, 11, 36] utilizes an acquisition function to balance exploration and exploitation; Reinforcement learning [27, 37] naturally emphasizes on exploration.
The scale problem is a recognized issue [12, 21, 40] that is known to severely affect the performance of falsification. In [40], we proposed a multiarmed bandit approach to solve the problem in a specific setting, that is, safety properties with Boolean connectives: \(\Box _I~(\varphi _1\wedge \varphi _2)\) and \(\Box _I~(\varphi _1\vee \varphi _2)\). The approach is not applicable to formulas having more nested subformulas, or even connectives having more operands; therefore many of the benchmarks we used in Subsect.5.2 fall out of the scope of [40]. The techniques introduced in [12, 21] rely on explicit declaration of input vacuity and output robustness. Compared to their approaches, our method does not need that, but we learn the significance of each signal through tree exploration and reward computation.
MCTS, as an effective search framework, has been applied in testing hybrid systems. In [30], the authors applied an adaption of MCTS in testing, namely, adaptive press testing, to detect the potential dangerous cases of airborne collision. A recent study of MCTS on hybrid system falsification is [39]. There, the authors discretized the search space to construct the search tree, and then applied MCTS to explore different subspaces. Compared to their approach, our work aims to tackle the scale problem and so we exploit the structure of specification formulas to construct the tree search framework.
7 Conclusion and Future Work
Optimizationbased falsification is a widely used approach for quality assurance of CPS, that tries to find an input violating a Signal Temporal Logic (STL) specification. It does this by exploiting the quantitative robust semantics of the specification, trying to minimize its robustness. The performance of falsification is affected by the scale problem in the presence of the comparison of robustness values of subformulas predicating over signals having different scales. In this paper, we propose QBRobustness, a new STL semantics that does not suffer from the scale problem, because it avoids such comparison. The computation of QBRobustness requires to specify a subformula sequence telling for which subformulas the quantitative robustness must be computed. We then propose a Monte Carlo Tree Search (MCTS)based falsification approach that synthesizes a subformula sequence for QBRobustness, and uses this for guiding numerical optimization. Experimental results show that the proposed approach achieves better falsification results than a stateoftheart falsification tool that uses standard quantitative robust semantics.
In the analysis of RQ1, we observed that, when the specifications have a particular structure, our approach has no advantage and, actually, it could decrease the performance by trying to find a best subformula sequence that does not exist for the current initial sampling. As future work, we plan to devise some heuristics that could handle these cases: for example, we could perform a better initial sampling (see Subsect. 2.1) that could provide a better initial guidance.
Notes
 1.
Note that we built such a trivial example just to make the scale problem very easy to understand. However, in general, the scale problem frequently occurs on much less trivial specifications, as we will see in the experiments.
 2.
Recall from Definition 1 that the “eventually” and “always” operators are defined in terms of the “until” operator.
 3.
ForeSee is available at https://github.com/choshina/ForeSee.
 4.
Note that all binary connectives of AT14 are disjunctions; indeed, \(A \rightarrow B\) is the syntactic sugar for .
 5.
Note that \(k=0\) corresponds to the experimental result in Table 2, and we report it again for reference.
References
Abbas, H., Fainekos, G.: Convergence proofs for simulated annealing falsification of safety properties. In: 50th Annual Allerton Conference on Communication, Control, and Computing, pp. 1594–1601. IEEE (2012)
Adimoolam, A., Dang, T., Donzé, A., Kapinski, J., Jin, X.: Classification and coveragebased falsification for embedded control systems. In: Majumdar, R., Kunčak, V. (eds.) CAV 2017. LNCS, vol. 10426, pp. 483–503. Springer, Cham (2017). https://doi.org/10.1007/9783319633879_24
Akazaki, T., Kumazawa, Y., Hasuo, I.: Causalityaided falsification. In: Proceedings First Workshop on Formal Verification of Autonomous Vehicles, FVAV@iFM 2017, Turin, Italy, 19th September 2017. EPTCS, vol. 257, pp. 3–18 (2017)
Annpureddy, Y., Liu, C., Fainekos, G., Sankaranarayanan, S.: STaLiRo: a tool for temporal logic falsification for hybrid systems. In: Abdulla, P.A., Leino, K.R.M. (eds.) TACAS 2011. LNCS, vol. 6605, pp. 254–257. Springer, Heidelberg (2011). https://doi.org/10.1007/9783642198359_21
Arcuri, A., Briand, L.: A practical guide for using statistical tests to assess randomized algorithms in software engineering. In: Proceedings of the 33rd International Conference on Software Engineering, ICSE 2011, pp. 1–10. ACM New York (2011)
Auer, P.: Using confidence bounds for exploitationexploration tradeoffs. J. Mach. Learn. Res. 3, 397–422 (2002)
Auger, A., Hansen, N.: A restart CMA evolution strategy with increasing population size. In: Proceedings of the IEEE Congress on Evolutionary Computation, CEC 2005, pp. 1769–1776. IEEE (2005)
Browne, C., et al.: A survey of monte carlo tree search methods. IEEE Trans. Comput. Intellig. AI Games 4(1), 1–43 (2012)
Chen, X., Ábrahám, E., Sankaranarayanan, S.: Flow*: an analyzer for nonlinear hybrid systems. In: Sharygina, N., Veith, H. (eds.) CAV 2013. LNCS, vol. 8044, pp. 258–263. Springer, Heidelberg (2013). https://doi.org/10.1007/9783642397998_18
Deshmukh, J., Jin, X., Kapinski, J., Maler, O.: Stochastic local search for falsification of hybrid systems. In: Finkbeiner, B., Pu, G., Zhang, L. (eds.) ATVA 2015. LNCS, vol. 9364, pp. 500–517. Springer, Cham (2015). https://doi.org/10.1007/9783319249537_35
Deshmukh, J.V., Horvat, M., Jin, X., Majumdar, R., Prabhu, V.S.: Testing cyberphysical systems through bayesian optimization. ACM Trans. Embedded Comput. Syst. 16(5), 170:1–170:18 (2017)
Dokhanchi, A., Yaghoubi, S., Hoxha, B., Fainekos, G.E.: Vacuity aware falsification for MTL requestresponse specifications. In: 13th IEEE Conference on Automation Science and Engineering, CASE 2017, Xi’an, China, 20–23 August 2017, pp. 1332–1337. IEEE (2017)
Donzé, A.: Breach, a toolbox for verification and parameter synthesis of hybrid systems. In: Touili, T., Cook, B., Jackson, P. (eds.) CAV 2010. LNCS, vol. 6174, pp. 167–170. Springer, Heidelberg (2010). https://doi.org/10.1007/9783642142956_17
Donzé, A., Maler, O.: Robust satisfaction of temporal logic over realvalued signals. In: Chatterjee, K., Henzinger, T.A. (eds.) FORMATS 2010. LNCS, vol. 6246, pp. 92–106. Springer, Heidelberg (2010). https://doi.org/10.1007/9783642152979_9
Dreossi, T., Dang, T., Donzé, A., Kapinski, J., Jin, X., Deshmukh, J.V.: Efficient guiding strategies for testing of temporal properties of hybrid systems. In: Havelund, K., Holzmann, G., Joshi, R. (eds.) NFM 2015. LNCS, vol. 9058, pp. 127–142. Springer, Cham (2015). https://doi.org/10.1007/9783319175249_10
Dreossi, T., Dang, T., Piazza, C.: Parallelotope bundles for polynomial reachability. In: Proceedings of the 19th International Conference on Hybrid Systems: Computation and Control, HSCC 2016 pp. 297–306. ACM, New York (2016)
Dreossi, T., Donzé, A., Seshia, S.A.: Compositional falsification of cyberphysical systems with machine learning components. In: Barrett, C., Davies, M., Kahsai, T. (eds.) NFM 2017. LNCS, vol. 10227, pp. 357–372. Springer, Cham (2017). https://doi.org/10.1007/9783319572888_26
Ernst, G., et al.: ARCHCOMP 2020 category report: Falsification. In: ARCH20. 7th International Workshop on Applied Verification of Continuous and Hybrid Systems (ARCH20). EPiC Series in Computing, vol. 74, pp. 140–152. EasyChair (2020)
Fainekos, G.E., Pappas, G.J.: Robustness of temporal logic specifications for continuoustime signals. Theor. Comput. Sci. 410(42), 4262–4291 (2009)
Fan, C., Qi, B., Mitra, S., Viswanathan, M., Duggirala, P.S.: Automatic reachability analysis for nonlinear hybrid models with C2E2. In: Chaudhuri, S., Farzan, A. (eds.) CAV 2016. LNCS, vol. 9779, pp. 531–538. Springer, Cham (2016). https://doi.org/10.1007/9783319415284_29
Ferrère, T., Nickovic, D., Donzé, A., Ito, H., Kapinski, J.: Interfaceaware signal temporal logic. In: Proceedings of the 22nd ACM International Conference on Hybrid Systems: Computation and Control, HSCC 2019, Montreal, QC, Canada, 16–18 April 2019, pp. 57–66 (2019)
Frehse, G., et al.: SpaceEx: scalable verification of hybrid systems. In: Gopalakrishnan, G., Qadeer, S. (eds.) CAV 2011. LNCS, vol. 6806, pp. 379–395. Springer, Heidelberg (2011). https://doi.org/10.1007/9783642221101_30
Hasuo, I., Suenaga, K.: Exercises in nonstandard static analysis of hybrid systems. In: Madhusudan, P., Seshia, S.A. (eds.) CAV 2012. LNCS, vol. 7358, pp. 462–478. Springer, Heidelberg (2012). https://doi.org/10.1007/9783642314247_34
Hoxha, B., Abbas, H., Fainekos, G.E.: Benchmarks for temporal logic requirements for automotive systems. In: 1st and 2nd International Workshop on Applied veRification for Continuous and Hybrid Systems, ARCH@CPSWeek 2014, Berlin, Germany, 14 April 2014/ARCH@CPSWeek 2015, Seattle, USA, 13 April 2015. EPiC Series in Computing, vol. 34, pp. 25–30. EasyChair (2014)
Jin, X., Deshmukh, J.V., Kapinski, J., Ueda, K., Butts, K.: Powertrain control verification benchmark. In: Proceedings of the 17th Int. Conf. on Hybrid Systems: Computation and Control, HSCC 2014, pp. 253–262. ACM, NY, USA (2014)
Kapinski, J., Deshmukh, J.V., Jin, X., Ito, H., Butts, K.: Simulationbased approaches for verification of embedded control systems: an overview of traditional and advanced modeling, testing, and verification techniques. IEEE Control Syst. 36(6), 45–64 (2016)
Kato, K., Ishikawa, F.: Learningbased falsification for model families of cyberphysical systems. In: 2019 IEEE 24th Pacific Rim International Symposium on Dependable Computing (PRDC), pp. 236–245 (December 2019)
Kocsis, L., Szepesvári, C.: Bandit based montecarlo planning. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) ECML 2006. LNCS (LNAI), vol. 4212, pp. 282–293. Springer, Heidelberg (2006). https://doi.org/10.1007/11871842_29
Kuřátko, J., Ratschan, S.: Combined global and local search for the falsification of hybrid systems. In: Legay, A., Bozga, M. (eds.) FORMATS 2014. LNCS, vol. 8711, pp. 146–160. Springer, Cham (2014). https://doi.org/10.1007/9783319105123_11
Lee, R., Kochenderfer, M.J., Mengshoel, O.J., Brat, G.P., Owen, M.P.: Adaptive stress testing of airborne collision avoidance systems. In: Digital Avionics Systems Conference, 2015 IEEE/AIAA 34th, pp. 6C21. IEEE (2015)
Liebrenz, T., Herber, P., Glesner, S.: Deductive verification of hybrid control systems modeled in Simulink with KeYmaera X. In: Sun, J., Sun, M. (eds.) ICFEM 2018. LNCS, vol. 11232, pp. 89–105. Springer, Cham (2018). https://doi.org/10.1007/9783030024505_6
Luersen, M.A., Le Riche, R.: Globalized Neldermead method for engineering optimization. Comput. Struct. 82(23), 2251–2260 (2004)
Platzer, A.: Logical Foundations of CyberPhysical Systems. Springer, Cham (2018) https://doi.org/10.1007/9783319635880
Seshia, S.A., Rakhlin, A.: Quantitative analysis of systems using gametheoretic learning. ACM Trans. Embed. Comput. Syst. 11(S2), 55:1–55:27 (2012)
Silver, D., et al.: Mastering the game of Go with deep neural networks and tree search. Nature 529, 484–489 (2015)
Silvetti, S., Policriti, A., Bortolussi, L.: An active learning approach to the falsification of black box cyberphysical systems. In: Polikarpova, N., Schneider, S. (eds.) IFM 2017. LNCS, vol. 10510, pp. 3–17. Springer, Cham (2017). https://doi.org/10.1007/9783319668451_1
Akazaki, T., Liu, S., Yamagata, Y., Duan, Y., Hao, J.: Falsification of cyberphysical systems using deep reinforcement learning. In: Havelund, K., Peleska, J., Roscoe, B., de Vink, E. (eds.) FM 2018. LNCS, vol. 10951, pp. 456–465. Springer, Cham (2018). https://doi.org/10.1007/9783319955827_27
Zhang, Z., Arcaini, P., Hasuo, I.: Hybrid system falsification under (in)equality constraints via search space transformation. IEEE Trans. Comput. Aided Des. Integr. Circ. Syst. 39(11), 3674–3685 (2020)
Zhang, Z., Ernst, G., Sedwards, S., Arcaini, P., Hasuo, I.: Twolayered falsification of hybrid systems guided by monte carlo tree search. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 37(11), 2894–2905 (2018)
Zhang, Z., Hasuo, I., Arcaini, P.: Multiarmed bandits for Boolean connectives in hybrid system falsification. In: Dillig, I., Tasiran, S. (eds.) CAV 2019. LNCS, vol. 11561, pp. 401–420. Springer, Cham (2019). https://doi.org/10.1007/9783030255404_23
Zhang, Z., Lyu, D., Arcaini, P., Ma, L., Hasuo, I., Zhao, J.: On the effectiveness of signal rescaling in hybrid system falsification. In: Dutle, A., Moscato, M.M., Titolo, L., Muñoz, C.A., Perez, I. (eds.) NFM 2021. LNCS, vol. 12673, pp. 392–399. Springer, Cham (2021). https://doi.org/10.1007/9783030763848_24
Zutshi, A., Deshmukh, J.V., Sankaranarayanan, S., Kapinski, J.: Multiple shooting, cegarbased falsification for hybrid systems. In: 2014 International Conference on Embedded Software, EMSOFT 2014, New Delhi, India, 12–17 October 2014, pp. 5:1–5:10. ACM (2014)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
Copyright information
© 2021 The Author(s)
About this paper
Cite this paper
Zhang, Z., Lyu, D., Arcaini, P., Ma, L., Hasuo, I., Zhao, J. (2021). Effective Hybrid System Falsification Using Monte Carlo Tree Search Guided by QBRobustness. In: Silva, A., Leino, K.R.M. (eds) Computer Aided Verification. CAV 2021. Lecture Notes in Computer Science(), vol 12759. Springer, Cham. https://doi.org/10.1007/9783030816858_29
Download citation
DOI: https://doi.org/10.1007/9783030816858_29
Published:
Publisher Name: Springer, Cham
Print ISBN: 9783030816841
Online ISBN: 9783030816858
eBook Packages: Computer ScienceComputer Science (R0)