Advertisement

A Boyer-Moore Type Algorithm for Timed Pattern Matching

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9884)

Abstract

The timed pattern matching problem is formulated by Ulus et al. and has been actively studied since, with its evident application in monitoring real-time systems. The problem takes as input a timed word/signal and a timed pattern (specified either by a timed regular expression or by a timed automaton); and it returns the set of those intervals for which the given timed word, when restricted to the interval, matches the given pattern. We contribute a Boyer-Moore type optimization in timed pattern matching, relying on the classic Boyer-Moore string matching algorithm and its extension to (untimed) pattern matching by Watson and Watson. We assess its effect through experiments; for some problem instances our Boyer-Moore type optimization achieves speed-up by two times, indicating its potential in real-world monitoring tasks where data sets tend to be massive.

Keywords

Pattern Match Regular Expression String Match Input String Naive Algorithm 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Importance of systems’ real-time properties is ever growing, with rapidly diversifying applications of computer systems—cyber-physical systems, health-care systems, automated trading, etc.—being increasingly pervasive in every human activity. For real-time properties, besides classic problems in theoretical computer science such as verification and synthesis, the problem of monitoring already turns out to be challenging. Monitoring asks, given an execution log and a specification, whether the log satisfies the specification; sometimes we are furthermore interested in which segment of the log satisfies/violates the specification. In practical deployment scenarios where we would deal with a number of very long logs, finding matching segments in a computationally tractable manner is therefore a pressing yet challenging matter.

In this context, inspired by the problems of string and pattern matching of long research histories, Ulus et al. recently formulated the problem of timed pattern matching [20]. In their formalization, the problem takes as input a timed signal w (values that change over the continuous notion of time) and a timed regular expression (TRE) \(\mathcal {R}\) (a real-time extension of regular expressions); and it returns the match set \(\mathcal {M}(w,\mathcal {R})=\{(t,t')\mid t<t', w|_{(t,t')}\in L(\mathcal {R})\}\), where \(w|_{(t,t')}\) is the restriction of w to the time interval \((t,t')\) and \(L(\mathcal {R})\) is the set of signals that match \(\mathcal {R}\).

Since its formulation timed pattern matching has been actively studied. The first offline algorithm is introduced in [20]; its application in conditional performance evaluation is pursued in [10]; and in [21] an online algorithm is introduced based on Brzozowski derivatives. Underlying these developments is the fundamental observation [20] that the match set \(\mathcal {M}(w,\mathcal {R})\)—an uncountable subset of \(\mathbb {R}_{\ge 0}^{2}\)—allows a finitary symbolic representation by inequalities.

Contributions. In this paper we are concerned with efficiency in timed pattern matching, motivated by our collaboration with the automotive industry on various light-weight verification techniques. Towards that goal we introduce optimization that extends the classic Boyer-Moore algorithm for string matching (finding a pattern string \( pat \) in a given word w). Specifically we rely on the extension of the latter to pattern matching (finding subwords of w that is accepted by an NFA \(\mathcal {A}\)) by Watson & Watson [24], and introduce its timed extension.

We evaluate its efficiency through a series of experiments; in some cases (including an automotive example) our Boyer-Moore type algorithm outperforms a naive algorithm (without the optimization) by twice. This constant speed-up may be uninteresting from the complexity theory point of view. However, given that in real-world monitoring scenarios the input set of words w can be literally big data,1 halving the processing time is a substantial benefit, we believe.

Our technical contributions are concretely as follows: (1) a (naive) algorithm for timed pattern matching (Sect. 4); (2) its online variant (Sect. 4); (3) a proof that the match set allows a finitary presentation (Theorem 4.3), much like in [20]; and (4) an algorithm with Boyer-Moore type optimization (Sect. 5). Throughout the paper we let (timed) patterns expressed as timed automata (TA), unlike timed regular expressions (TRE) in [10, 20, 21]. Besides TA is known to be strictly more expressive than TRE (see [12] and also Case 2 of Sect. 6), our principal reason for choosing TA is so that the Boyer-Moore type pattern matching algorithm in [24] smoothly extends.

Related and Future Work. The context of the current work is run-time verification and monitoring of cyber-physical systems, a field of growing research activities (see e.g. recent [11, 14]). One promising application is in conditional quantitative analysis [10], e.g. of fuel consumption of a car during acceleration, from a large data set of driving record. Here our results can be used to efficiently isolate the acceleration phases.

Aside from timed automata and TREs, metric and signal temporal logics (MTL/STL) are commonly used for specifying continuous-time signals. Monitoring against these formalisms has been actively studied, too [7, 8, 9, 13]. It is known that an MTL formula can be translated to a timed alternating automaton [18]. MTL/STL tend to be used against “smooth” signals whose changes are continuous, however, and it is not clear how our current results (on timed-stamped finite words) would apply to such a situation. One possible practical approach would be to quantize continuous-time signals.

Being online—to process a long timed word w one can already start with its prefix—is obviously a big advantage in monitoring algorithms. In [21] an online timed pattern matching algorithm (where a specification is a TRE) is given, relying on the timed extension of Brzozowski derivative. We shall aim at an online version of our Boyer-Moore type algorithm (our online algorithm in Sect. 4 is without the Boyer-Moore type optimization), although it seems hard already for the prototype problem of string matching.

It was suggested by multiple reviewers that use of zone automata can further enhance our Boyer-Moore type algorithm for timed pattern matching. See Remark 5.6.

Organization of the Paper. We introduce necessary backgrounds in Sect. 2, on: the basic theory of timed automata, and the previous Boyer-Moore algorithms (for string matching, and the one in [24] for (untimed) pattern matching). The latter will pave the way to our main contribution of the timed Boyer-Moore algorithm. We formulate the timed pattern matching problem in Sect. 3; and a (naive) algorithm is presented in Sect. 4 together with its online variant. In Sect. 5 a Boyer-Moore algorithm for timed pattern matching is described, drawing intuitions from the untimed one and emphasizing where are the differences. In Sect. 6 we present the experiment results; they indicate the potential of the proposed algorithm in real-world monitoring applications.

Most proofs are deferred to the appendix in [23] due to lack of space.

2 Preliminaries

2.1 Timed Automata

Here we follow [1, 3], possibly with a fix to accept finite words instead of infinite. For a sequence \(\overline{s}=s_1 s_2 \cdots s_{n}\) we write \(|\overline{s}|=n\); and for ij such that \(1 \le i \le j \le |s|\), \(\overline{s} (i)\) denotes the element \(s_i\) and \(\overline{s}(i,j)\) denotes the subsequence \(s_i s_{i+1} \cdots s_j\).

Definition 2.1

(timed word). A timed word over an alphabet \(\varSigma \) is an element of \((\varSigma \times \mathbb {R}_{> 0})^*\)—which is denoted by \((\overline{a},\overline{\tau })\) using \(\overline{a}\in \varSigma ^{*}\), \(\overline{\tau }\in (\mathbb {R}_{> 0})^{*}\) via the embedding \((\varSigma \times \mathbb {R}_{>0})^* \hookrightarrow \varSigma ^* \times (\mathbb {R}_{>0})^*\)—such that for any \(i\in [1,|\overline{\tau }|-1]\) we have \(0< \tau _i < \tau _{i+1}\). Let \((\overline{a},\overline{\tau })\) be a timed word and \(t \in \mathbb {R}\) be such that \(-\tau _1 < t\). The t-shift \((\overline{a},\overline{\tau }) + t\) of \((\overline{a},\overline{\tau })\) is the timed word \((\overline{a},\overline{\tau }+t)\), where \(\overline{\tau }+t\) is the sequence \(\tau _1+t,\tau _2+t,\cdots ,\tau _{|\overline{\tau }|}+t\). Let \((\overline{a},\overline{\tau })\) and \((\overline{a'},\overline{\tau '})\) be timed words over \(\varSigma \) such that \(\tau _{|\tau |} < \tau '_{1}\). Their absorbing concatenation \((\overline{a},\overline{\tau }) \circ (\overline{a'},\overline{\tau '})\) is defined by \( (\overline{a},\overline{\tau }) \circ (\overline{a'},\overline{\tau '}) = (\overline{a}\circ \overline{a'}, \overline{\tau }\circ \overline{\tau '}) \), where \(\overline{a}\circ \overline{a'}\) and \(\overline{\tau }\circ \overline{\tau '}\) denote (usual) concatenation of sequences over \(\varSigma \) and \(\mathbb {R}_{>0}\), respectively. Now let \((\overline{a},\overline{\tau })\) and \((\overline{a''},\overline{\tau ''})\) be arbitrary timed words over \(\varSigma \). Their non-absorbing concatenation \((\overline{a},\overline{\tau }) \cdot (\overline{a''},\overline{\tau ''})\) is defined by \((\overline{a},\overline{\tau }) \cdot (\overline{a''},\overline{\tau ''}) = (\overline{a},\overline{\tau }) \circ ((\overline{a''},\overline{\tau ''}) + \tau _{|\overline{\tau }|})\). A timed language over an alphabet \(\varSigma \) is a set of timed words over \(\varSigma \).

Remark 2.2

(signal). Signal is another formalization of records with a notion of time, used e.g. in [20]; a signal over \(\varSigma \) is a function \(\mathbb {R}_{\ge 0} \rightarrow \varSigma \). A timed word describes a time-stamped sequence of events, while a signal describes values of \(\varSigma \) that change over time. In this paper we shall work with timed words. This is for technical reasons and not important from the applicational point of view: when we restrict to those signals which exhibit only finitely many changes, there is a natural correspondence between such signals and timed words.

Let C be a (fixed) finite set of clock variables. The set \(\varPhi (C)\) of clock constraints is defined by the following BNF notation.
$$ \varPhi (C) \;\ni \; \delta \; =\; x < c \mid x > c \mid x \le c \mid x \ge c \mid \mathbf{true }\mid \delta \wedge \delta \quad \text {where}~x \in C~\text {and}~c \in \mathbb {Z}_{\ge 0}\text {.} $$
Absence of \(\vee \) or \(\lnot \) does not harm expressivity: \(\vee \) can be emulated with nondeterminism (see Definition 2.3); and \(\lnot \) can be propagated down to atomic formulas by the de Morgan laws. Restriction to \(\mathbf{true }\) and \(\wedge \) is technically useful, too, when we deal with intervals and zones (Definition 4.1).

A clock interpretation \(\nu \) over the set C of clock variables is a function \(\nu : C \rightarrow \mathbb {R}_{\ge 0}\). Given a clock interpretation \(\nu \) and \(t \in \mathbb {R}_{\ge 0}\), \(\nu + t\) denotes the clock interpretation that maps a clock variable \(x\in C\) to \(\nu (x) + t\).

Definition 2.3

(timed automaton). A timed automaton (TA) \(\mathcal {A}\) is a tuple \((\varSigma ,S,S_0,C,E,F)\) where: \(\varSigma \) is a finite alphabet; S is a finite set of states; \(S_0 \subseteq S\) is the set of initial states; C is the set of clock variables; \(E \subseteq S \times S \times \varSigma \times \mathcal {P}(C) \times \varPhi (C)\) is the set of transitions; and \(F \subseteq S\) is the set of accepting states.

The intuition for \((s,s',a,\lambda ,\delta )\in E\) is: from s, also assuming that the clock constraint \(\delta \) is satisfied, we can move to the state \(s'\) conducting the action a and resetting the value of each clock variable \(x\in \lambda \) to 0. Examples of TAs are in (7) and Figs. 7, 8, 9, 10 and 11 later.

The above notations (as well as the ones below) follow those in [1]. In the following definition (1) of run, for example, the first transition occurs at (absolute) time \(\tau _{1}\) and the second occurs at time \(\tau _{2}\); it is implicit that we stay at the state \(s_{1}\) for time \(\tau _{2}-\tau _{1}\).

Definition 2.4

(run). A run of a timed automaton \(\mathcal {A} = (\varSigma ,S,S_0,C,E,F)\) over a timed word \((\overline{a},\overline{\tau }) \in (\varSigma \times \mathbb {R}_{>0})^*\) is a pair \((\overline{s},\overline{\nu }) \in S^* \times ((\mathbb {R}_{\ge 0})^C)^*\) of a sequence \(\overline{s}\) of states and a sequence \(\overline{\nu }\) of clock interpretations, subject to the following conditions: (1) \(|\overline{s}| = |\overline{\nu }| = |\overline{a}| + 1\); (2) \(s_0 \in S_0\), and for any \(x \in C\), \(\nu _0 (x) = 0\); and (3) for any \(i\in [0,|\overline{a}|-1]\) there exists a transition \((s_i,s_{i+1},a_{i+1},\lambda ,\delta )\in E\) such that the clock constraint \(\delta \) holds under the clock interpretation \(\nu _i + (\tau _{i+1} - \tau _{i})\) (here \(\tau _{0}\) is defined to be 0), and the clock interpretation \(\nu _{i+1}\) has it that \(\nu _{i+1} (x) = \nu _i (x) + \tau _{i+1} - \tau _{i}\) (if \(x \notin \lambda \)) and \(\nu _{i+1} (x) = 0\) (if \(x\in \lambda \)). This run is depicted as follows.
$$\begin{aligned} (s_0,\nu _0) \mathop {\longrightarrow }\limits ^{(a_1,\tau _1)} (s_1,\nu _1) \mathop {\longrightarrow }\limits ^{(a_2,\tau _2)} \cdots \longrightarrow (s_{|\overline{a}|-1},\nu _{|\overline{\tau }|-1}) \mathop {\longrightarrow }\limits ^{(a_{|\overline{a}|},\tau _{|\overline{\tau }|})} (s_{|\overline{a}|},\nu _{|\overline{\tau }|}) \end{aligned}$$
(1)
Such a run \((\overline{s},\overline{\nu })\) of \(\mathcal {A}\) is accepting if \(s_{|\overline{s}|-1} \in F\). The language \(L (\mathcal {A})\) of \(\mathcal {A}\) is defined by \(L (\mathcal {A}) = \{w \mid \text {there is an accepting run of}~\mathcal {A}~\text {over}~w\}\).

There is another specification formalism for timed languages called timed regular expressions (TREs) [2, 3]. Unlike in the classic Kleene theorem, in the timed case timed automata are strictly more expressive than TREs. See [12, Proposition 2].

Region automaton is an important theoretical gadget in the theory of timed automaton: it reduces the domain \(S \times (\mathbb {R}_{\ge 0})^C\) of pairs \((s,\nu )\) in (1)—that is an infinite set—to its finite abstraction, the latter being amenable to algorithmic treatments. Specifically it relies on an equivalence relation \(\sim \) over clock interpretations. Given a timed automaton \(\mathcal {A} = (\varSigma ,S,S_0,C,E,F)\)—where, without loss of generality, we assume that each clock variable \(x \in C\) appears in at least one clock constraint in E—let \(c_x\) denote the greatest number that is compared with x in the clock constraints in E. (Precisely: \(c_{x}=\max \{c\in \mathbb {Z}_{\ge 0}\mid x\bowtie c~\text {occurs in}~E,~\text {where}~{\bowtie }\in \{<,>,\le ,\ge \}\}\).) Writing \(\mathrm {int}(\tau )\) and \(\mathrm {frac}(\tau )\) for the integer and fractional parts of \(\tau \in \mathbb {R}_{\ge 0}\), an equivalence relation \(\sim \) over clock interpretations \({\nu ,\nu {'}}\) is defined as follows. We have \({\nu \sim \nu {'}}\) if:
  • for each \(x\in C\) we have \(\mathrm {int}(\nu (x)) = \mathrm {int}(\nu '(x))\) or (\(\nu (x) > c_x\) and \(\nu ' (x) > c_x\));

  • for any \(x,y \in C\) such that \(\nu (x) \le c_x\) and \(\nu (y) \le c_y\), \(\mathrm {frac}(\nu (x)) < \mathrm {frac}(\nu (y))\) if and only if \(\mathrm {frac}(\nu '(x)) < \mathrm {frac}(\nu '(y))\); and

  • for any \(x {\in } C\) such that \(\nu (x) \le c_x\), \(\mathrm {frac}(\nu (x)) = 0\) if and only if \(\mathrm {frac}(\nu '(x)) = 0\).

A clock region is an equivalence class of clock interpretations modulo \(\sim \); as usual the equivalence class of \(\nu \) is denoted by \([\nu ]\). Let \(\alpha ,\alpha '\) be clock regions. We say \(\alpha '\) is a time-successor of \(\alpha \) if for any \(\nu \in \alpha \), there exists \(t \in \mathbb {R}_{> 0 }\) such that \(\nu + t \in \alpha '\).

Definition 2.5

(region automaton). For a timed automaton \(\mathcal {A} = (\varSigma ,S,S_0,C,E,F)\), the region automaton \(R (\mathcal {A})\) is the NFA \((\varSigma ,S',S'_0,E',F')\) defined as follows: \(S' = S \times \bigl ((\mathbb {R}_{\ge 0})^{C}/{\sim }\bigr )\); on initial states \(S'_0 = \{(s,[\nu ]) \mid s \in S_0, \nu (x) = 0~\text {for each}~x\in C\}\); on accepting states \(F' = \{(s,\alpha ) \in S' \mid s \in F\}\). The transition relation \(E'\subseteq S'\times S'\times \varSigma \) is defined as follows: \(((s,\alpha ),(s',\alpha '),a) \in E'\) if there exist a clock region \(\alpha ''\) and \((s,s',a,\lambda ,\delta ) \in E\) such that
  • \(\alpha ''\) is a time-successor of \(\alpha \), and

  • for each \(\nu \in \alpha ''\), (1) \(\nu \) satisfies \(\delta \), and (2) there exists \(\nu ' \in \alpha '\) such that \(\nu (x) = \nu ' (x)\) (if \(x\notin \lambda \)) and \(\nu ' (x) = 0\) (if \(x\in \lambda \)).

It is known [1] that the region automaton \(R(\mathcal {A})\) indeed has finitely many states.

The following notation for NFAs will be used later.

Definition 2.6

(\( Runs _{\mathcal {A}} (s,s')\)). Let \(\mathcal {A}\) be an NFA over \(\varSigma \), and s and \(s'\) be its states. We let \( Runs _{\mathcal {A}} (s,s')\) denote the set of runs from s to \(s'\), that is, \( Runs _{\mathcal {A}} (s,s') = \{s_{0}s_{1}\cdots s_{n}\mid n\in \mathbb {Z}_{\ge 0}, s_{0}=s, s_{n}=s', \forall i .\,\exists a_{i+1} .\, s_{i}\mathop {\rightarrow }\limits ^{a_{i+1}} s_{i+1} \text { in}~\mathcal {A}\} \).

2.2 String Matching and the Boyer-Moore Algorithm

In Sects. 2.2 and 2.3 we shall revisit the Boyer-Moore algorithm and its adaptation for pattern matching [24]. We do so in considerable details, so as to provide both technical and intuitional bases for our timed adaptation.
Fig. 1.

The string matching problem

String matching is a fundamental operation on strings: given an input string \( str \) and a pattern string \( pat \), it asks for the match set \(\bigl \{(i,j)\,\bigl |\bigr .\, str (i,j)= pat \bigr \}\). An example (from [17]) is in Fig. 1, where the answer is \(\{(18,24)\}\).

A brute-force algorithm has the complexity \(O(| str || pat |)\); known optimizations include ones by Knuth, Morris, and Pratt [15] and by Boyer and Moore [5]. The former performs better in the worst case, but for practical instances the latter is commonly used. Let us now demonstrate how the Boyer-Moore algorithm for string matching works, using the example in Fig. 1. Its main idea is to skip unnecessary matching of characters, using two skip value functions \(\varDelta _{1}\) and \(\varDelta _{2}\) (that we define later).

The bottom line in the Boyer-Moore algorithm is that the pattern string \( pat \) moves from left to right, and matching between the input string \( str \) and \( pat \) is conducted from right to left. In (2) is the initial configuration, and we set out with comparing the characters \( str (7)\) and \( pat (7)\). They turn out to be different.

A naive algorithm would then move the pattern to the right by one position. We can do better, however, realizing that the character \( str (7)=\mathrm {S}\) (that we already read for comparison) never occurs in the pattern \( pat \). This means the position 7 cannot belong to any matching interval (ij), and we thus jump to the configuration (3). Formally this argument is expressed by the value \(\varDelta _{1}(\mathrm {S},7)=7\) of the first skip value function \(\varDelta _{1}\), as we will see later.

Here again we compare characters from right to left, in (3), realizing immediately that \( str (14)\ne pat (7)\). It is time to shift the pattern; given that \( str (14)=\mathrm {P}\) occurs as \( pat (5)\), we shift the pattern by \(\varDelta _{1}(\mathrm {P},7)=7-5=2\).

We are now in the configuration (4), and some initial matching succeeds (\( str (16)= pat (7)\), \( str (15)= pat (6)\), and so on). The matching fails for \( str (12)\ne pat (3)\). Following the same reasoning as above—the character \( str (12)=\mathrm {I}\) does not occur in \( pat (3)\), \( pat (2)\) or \( pat (1)\)—we would then shift the pattern by \(\varDelta _{1}(\mathrm {I},3)=3\).
Fig. 2.

Table for computing \(\varDelta _{2}\)

However we can do even better. Consider the table on the right, where we forget about the input \( str \) and shift the pattern \( pat \) one by one, trying to match it with \( pat \) itself. We are specifically interested in the segment \(\mathrm {MPLE}\) from \( pat (4)\) to \( pat (7)\) (underlined in the first row)—because it is the partial match we have discovered in the configuration (4). The table shows that we need to shift at least by 6 to get a potential match (the last row); hence from the configuration (4) we can shift the pattern \( pat \) by 6, which is more than the skip value in the above (\(\varDelta _{1}(\mathrm {I},3)=3\)). This argument—different from the one for \(\varDelta _{1}\)—is formalized as the second skip value function \(\varDelta _{2}(3)=6\) (Fig. 2).

We are led to the configuration on the right, only to find that the first matching trial fails (\( str (22)\ne pat (7)\)). Since \( str (22)=\mathrm {P}\) occurs in \( pat \) as \( pat (5)\), we shift \( pat \) by \(\varDelta _{1}(\mathrm {P},7)= 2\). This finally brings us to the configuration in Fig. 1 and the match set \(\{(18,24)\}\).

Summarizing, the key in the Boyer-Moore algorithm is to use two skip value functions \(\varDelta _{1},\varDelta _{2}\) to shift the pattern faster than one-by-one. The precise definition of \(\varDelta _{1},\varDelta _{2}\) is in Appendix A in [23], for reference.

2.3 Pattern Matching and a Boyer-Moore Type Algorithm

Pattern matching is another fundamental operation that generalizes string matching: given an input string \( str \) and a regular language L as a pattern, it asks for the match set \(\bigl \{(i,j)\,\bigl |\bigr .\, str (i,j)\in L \bigr \}\). For example, for \( str \) in Fig. 1 and the pattern \(\mathrm {[A-Z]}^{*}\mathrm {MPLE}\), the match set is \(\{(11,16),(18,24)\}\). In [24] an algorithm for pattern matching is introduced that employs “Boyer-Moore type” optimization, much like the use of \(\varDelta _{2}\) in Sect. 2.2.
Fig. 3.

The automaton \(\mathcal {A}\) and skip values

Fig. 4.

A brute-force algorithm

Let \( str =\mathrm {cbadcdc}\) be an input string and \(\mathrm {dc}^*\{\mathrm {ba} \mid \mathrm {dc}\}\) be a pattern L, for example. We can solve pattern matching by the following brute-force algorithm.

  • We express the pattern as an NFA \(\mathcal {A}=(\varSigma ,S,S_{0},E,F)\) in Fig. 3. We reverse words—\(w\in L\) if and only if \(w^{ Rev }\in L(\mathcal {A})\)—following the Boyer-Moore algorithm (Sect. 2.2) that matches a segment of input and a pattern from right to left.

  • Also following Sect. 2.2 we “shift the pattern from left to right.” This technically means: we conduct the following for \(j=1\) first, then \(j=2\), and so on. For fixed j we search for \(i\in [1,j]\) such that \( str (i,j)\in L\). This is done by computing the set \(S_{(i,j)}\) of reachable states of \(\mathcal {A}\) when it is fed with \( str (i,j)^{ Rev }\). The computation is done step-by-step, decrementing i from j to 1:
    $$\begin{aligned} S_{(j,j)}=\{s\mid \exists s'\in S_{0}.\, s'\mathop {\rightarrow }\limits ^{ str (j)} s\}, \;\text {and}\; S_{(i,j)}=\{s\mid \exists s'\in S_{(i+1,j)}.\, s'\mathop {\rightarrow }\limits ^{ str (i)} s\}. \end{aligned}$$

Then (ij) is a matching interval if and only if \(S_{(i,j)}\cap F\ne \emptyset \). See Fig. 4.

The Boyer-Moore type optimization in [24] tries to hasten the shift of j. A key observation is as follows. Assume \(j=4\); then the above procedure would feed \( str (1,4)^{ Rev }=\mathrm {dabc}\) to the automaton \(\mathcal {A}\) (Fig. 3). We instantly see that this would not yield any matching interval—for a word to be accepted by \(\mathcal {A}\) it must start with abd, cdd, abc or cdc.

Precisely the algorithm in [24] works as follows. We first observe that the shortest word accepted by \(\mathcal {A}\) is 3; therefore we can start right away with \(j=3\), skipping \(j=1,2\) in the above brute-force algorithm. Unfortunately \( str (1,3)=\mathrm {cba}\) does not match L, with \( str (1,3)^{ Rev }=\mathrm {abc}\) only leading to \(\{s_{3}\}\) in \(\mathcal {A}\).
Fig. 5.

Table for \(\varDelta _{2}(s_{3})\)

We now shift j by 2, from 3 to 5, following Fig. 5. Here
$$\begin{aligned} L'=\bigl \{\,\bigl (w(1,3)\bigr )^{ Rev }\,\bigl |\bigr .\, w\in L(\mathcal {A})\,\bigr \}=\{ \mathrm {dba}, \mathrm {ddc}, \mathrm {cba}, \mathrm {cdc} \}; \end{aligned}$$
(6)
that is, for \( str (i,j)\) to match the pattern L its last three characters must match \(L'\). Our previous “key observation” now translates to the fact that \( str (2,4)=\mathrm {bad}\) does not belong to \(L'\); in the actual algorithm in [24], however, we do not use the string \( str (2,4)\) itself. Instead we overapproximate it with the information that feeding \(\mathcal {A}\) with \( str (1,3)^{ Rev }=\mathrm {abc}\) led to \(\{s_{3}\}\). Similarly to the case with \(L'\), this implies that the last two characters of \( str (1,3)\) must have been in \(L'_{s_{3}}=\{\mathrm {ba},\mathrm {dc}\}\). The table shows that none in \(L'_{s_{3}}\) matches any of \(L'\) when j is shifted by 1; when j is shifted by 2, we have matches (underlined). Therefore we jump from \(j=3\) to \(j=5\).

This is how the algorithm in [24] works: it accelerates the brute-force algorithm in Fig. 4, skipping some j’s, with the help of a skip value function \(\varDelta _{2}\). The overapproximation in the last paragraph allows \(\varDelta _{2}\) to rely only on a pattern L (but not on an input string \( str \)); this means that pre-processing is done once we fix the pattern L, and it is reused for various input strings \( str \). This is an advantage in monitoring applications where one would deal with a number of input strings \( str \), some of which are yet to come. See Appendix B in [23] for the precise definition of the skip value function \(\varDelta _{2}\).

In Fig. 3 we annotate each state s with the values \(m_{s}\) and \(L'_{s}\) that is used in computing \(\varDelta _{2}(\{s\})\). Here \( m_{s} \) is the length of a shortest word that leads to s; \( m = \min _{s\in F}m_{s} \) (that is 3 in the above example); and \( L'_{s} = \{w(1,\min \{m_{s},m\})^{ Rev }\mid w\in L(\mathcal {A}_{s})\} \).

It is not hard to generalize the other skip value function \(\varDelta _1\) in Sect. 2.2 for pattern matching, too: instead of \( pat \) we use the set \(L'\) in the above (6). See Appendix B in [23].

3 The Timed Pattern Matching Problem

Here we formulate our problem, following the notations in Sect. 2.1.

Given a timed word w, the timed word segment \(w|_{(t,t')}\) is the result of clipping the parts outside the open interval \((t,t')\). For example, for \(w = \bigl ((a,b,c),(0.7,1.2,1.5)\bigr )\), we have \(w|_{(1.0,1.7)} = \bigl ((b,c,\$),(0.2,0.5,0.7)\bigr )\), \(w|_{(1.0,1.5)} = \bigl ((b,\$),(0.2,0.5)\bigr )\) and \(w|_{(1.2,1.5)} = \bigl ((\$),(0.3)\bigr )\). Here the (fresh) terminal character \({\$}\) designates the end of a segment. Since we use open intervals \((t,t')\), for example, the word \(w|_{(1.2,1.5)}\) does not contain the character c at time 0.3. The formal definition is as follows.

Definition 3.1

(timed word segment). Let \(w = (\overline{a},\overline{\tau })\) be a timed word over \(\varSigma \), t and \(t'\) be reals such that \(0 \le t < t'\), and i and j be indices such that \(\tau _{i-1} \le t < \tau _{i}\) and \(\tau _j < t' \le \tau _{j+1}\) (we let \(\tau _0=0\) and \(\tau _{|\overline{\tau }|+1}=\infty \)). The timed word segment \(w|_{(t,t')}\) of w on the interval \((t,t')\) is the timed word \((\overline{a'},\overline{\tau '})\), over the extended alphabet \(\varSigma \amalg \{\$\}\), defined as follows: (1) \(\left| w|_{(t,t')} \right| = j-i+2\); (2) we have \(a'_k = a_{i+k-1}\) and \(\tau '_k = \tau _{i+k-1} - t\) for \(k\in [1, j - i + 1]\); and (3) \(a'_{j-i+2} = \$ \) and \(\tau '_{j-i+2} = t'-t\).

Definition 3.2

(timed pattern matching). The timed pattern matching problem (over an alphabet \(\varSigma \)) takes (as input) a timed word w over \(\varSigma \) and a timed automaton \(\mathcal {A}\) over \(\varSigma \amalg \{\$\}\); and it requires (as output) the match set \(\mathcal {M} (w,\mathcal {A}) = \bigl \{(t,t') \in (\mathbb {R}_{\ge 0})^{2} \mid t < t', w|_{(t,t')} \in L (\mathcal {A})\bigr \}\).

Our formulation in Definition 3.2 slightly differs from that in [20] in that: (1) we use timed words in place of signals (Remark 2.2); (2) for specification we use timed automata rather than timed regular expressions; and (3) we use an explicit terminal character \({\$}\). While none of these differences is major, introduction of \({\$}\) enhances expressivity, e.g. in specifying “an event a occurs, and after that, no other event occurs within 2s.” (see (7)). It is also easy to ignore \({\$}\)—when one is not interested in it—by having the clock constraint true on the \({\$}\)-labeled transitions leading to the accepting states.

Assumption 3.3

In what follows we assume the following. Each timed automaton \(\mathcal {A}\) over the alphabet \(\varSigma \amalg \{\$\}\) is such that: every \({\$}\)-labeled transition is into an accepting state; and no other transition is \({\$}\)-labeled. And there exists no transition from any accepting states.

4 A Naive Algorithm and Its Online Variant

Here we present a naive algorithm for timed pattern matching (without a Boyer-Moore type optimization), also indicating how to make it into an online one. Let us fix a timed word w over \(\varSigma \) and a timed automaton \(\mathcal {A} = (\varSigma \amalg \{\$\},S,S_0,C,E,F)\) as the input.

First of all, a match set (Definition 3.2) is in general an infinite set, and we need its finitary representation for an algorithmic treatment. We follow [20] and use (2-dimensional) zones for that purpose.

Definition 4.1

(zone). Consider the 2-dimensional plane \(\mathbb {R}^{2}\) whose axes are denoted by t and \(t'\). A zone is a convex polyhedron specified by constraints of the form \(t\bowtie c\), \(t'\bowtie c\) and \(t'-t\bowtie c\), where \({\bowtie } \in \{<,>,\le ,\ge \}\) and \(c \in \mathbb {Z}_{\ge 0}\).

It is not hard to see that each zone is specified by three intervals (that may or may not include their endpoints): \(T_{0}\) for t, \(T_{f}\) for \(t'\) and \(T_{\varDelta }\) for \(t'-t\). We let a triple \((T_{0},T_{f},T_{\varDelta })\) represent a zone, and write \((t,t') \in (T_0,T_f,T_{\varDelta })\) if \(t \in T_0\), \(t' \in T_f\) and \(t' - t \in T_{\varDelta }\).

In our algorithms we shall use the following constructs.

Definition 4.2

(\(\mathrm {reset},\mathrm {eval},\mathrm {solConstr},\rho _{\emptyset }, Conf \)). Let \(\rho :C\rightharpoonup \mathbb {R}_{>0}\) be a partial function that carries a clock variable \(x\in C\) to a positive real; the intention is that x was reset at time \(\rho (x)\) (in the absolute clock). Let \(x\in C\) and \(t_{r}\in \mathbb {R}_{>0}\); then the partial function \(\mathrm {reset}(\rho ,x,t_r):C\rightharpoonup \mathbb {R}_{>0}\) is defined by: \(\mathrm {reset}(\rho ,x,t_r)(x)=t_{r}\) and \(\mathrm {reset}(\rho ,x,t_r)(y)=\rho (y)\) for each \(y\in C\) such that \(y\ne x\). (The last is Kleene’s equality between partial functions, to be precise.)

Now let \(\rho \) be as above, and \(t,t_{0}\in \mathbb {R}_{\ge 0}\), with the intention that t is the current (absolute) time and \(t_{0}\) is the epoch (absolute) time for a timed word segment \(w|_{(t_{0},t')}\). We further assume \(t_{0}\le t\) and \(t_{0}\le \rho (x)\le t\) for each \(x\in C\) for which \(\rho (x)\) is defined. The clock interpretation \(\mathrm {eval}(\rho ,t,t_0):C\rightarrow \mathbb {R}_{\ge 0}\) is defined by: \(\mathrm {eval}(\rho ,t,t_0)(x)=t-\rho (x)\) (if \(\rho (x)\) is defined); and \(\mathrm {eval}(\rho ,t,t_0)(x)=t-t_{0}\) (if \(\rho (x)\) is undefined).

For intervals \(T,T'\subseteq \mathbb {R}_{\ge 0}\), a partial function \(\rho :C\rightharpoonup \mathbb {R}_{>0}\) and a clock constraint \(\delta \) (Sect. 2.1), we define \(\mathrm {solConstr}(T,T',\rho ,\delta )=\bigl \{\,(t,t')\,\bigl |\bigr .\,t\in T, t'\in T', \mathrm {eval}(\rho ,t',t)\models \delta \,\bigr \}\).

We let \(\rho _{\emptyset }:C\rightharpoonup \mathbb {R}_{>0}\) denote the partial function that is nowhere defined.

For a timed word w, a timed automaton \(\mathcal {A}\) and each \(i,j\in [1, |w|]\), we define the set of “configurations”: \( Conf (i,j) = \bigl \{ (s,\rho ,T) \,\bigl |\bigr .\, \forall t_0 \in T.\, \exists (\overline{s},\overline{\nu }).\, (\overline{s},\overline{\nu }) \text { is a run over } w (i,j) - t_0 , s_{|\overline{s}|-1} = s \text {, and } \nu _{|\overline{\nu }|-1} = \mathrm {eval}(\rho ,\tau _j,t_0)\bigr \} \). Further details are in Appendix C in [23].

Fig. 6.

ij in our algorithms for timed pattern matching

Our first (naive) algorithm for timed pattern matching is in Algorithm 1. We conduct a brute-force breadth-first search, computing \(\bigl \{\,(t,t')\in \mathcal {M}(w,\mathcal {A})\,\bigl |\bigr .\, \tau _{i-1}\le t< \tau _{i}, \tau _{j} < t'\le \tau _{j+1}\,\bigr \}\) for each ij, with the aid of \( Conf (i,j)\) in Definition 4.2. (The singular case of \(\forall i.\, \tau _{i}\not \in (t,t')\) is separately taken care of by \( Immd \).) We do so in the order illustrated in Fig. 6: we decrement i, and for each i we increment j. This order—that flips the one in Fig. 4—is for the purpose of the Boyer-Moore type optimization later in Sect. 5. In Appendix C in [23] we provide further details.

Theorem 4.3

(termination and correctness of Algorithm 1)
  1. 1.

    Algorithm 1 terminates and its answer Z is a finite union of zones.

     
  2. 2.

    For any \(t, t' \in \mathbb {R}_{>0}\) such that \(t < t'\), the following are equivalent: (1) there is a zone \((T_0,T_f,T_{\varDelta }) \in Z\) such that \((t,t') \in (T_0,T_f,T_{\varDelta })\); and (2) there is an accepting run \((\overline{s},\overline{\nu })\) over \(w|_{(t,t')}\) of \(\mathcal {A}\).

     
   \(\square \)

As an immediate corollary, we conclude that a match set \(\mathcal {M} (w,\mathcal {A})\) always allows representation by finitely many zones.

Changing the order of examination of ij (Fig. 6) gives us an online variant of Algorithm 1. It is presented in Appendix D in [23]; nevertheless our Boyer-Moore type algorithm is based on the original Algorithm 1.

5 A Timed Boyer-Moore Type Algorithm

Here we describe our main contribution, namely a Boyer-Moore type algorithm for timed pattern matching. Much like the algorithm in [24] skips some j’s in Fig. 4 (Sect. 2.3), we wish to skip some i’s in Fig. 6. Let us henceforth fix a timed word \(w = (\overline{a},\overline{\tau })\) and a timed automaton \(\mathcal {A} = (\varSigma \amalg \{\$\},S,S_0,C,E,F)\) as the input of the problem.

Let us define the optimal skip value function by \( Opt (i) = \min \{n \in \mathbb {R}_{>0} \mid \exists t \in [\tau _{i-n-1},\tau _{i-n}).\, \exists t' \in (t,\infty ).\,(t,t') \in \mathcal {M} (w,\mathcal {A})\}\); the value \( Opt (i)\) designates the biggest skip value, at each i in Fig. 6, that does not change the outcome of the algorithm. Since the function \( Opt \) is not amenable to efficient computation in general, our goal is its underapproximation that is easily computed.

Towards that goal we follow the (untimed) pattern matching algorithm in [24]; see Sect. 2.3. In applying the same idea as in Fig. 5 to define a skip value, however, the first obstacle is that the language \(L'_{s_{3}}\)—the set of (suitable prefixes of) all words that lead to \(s_{3}\)—becomes an infinite set in the current timed setting. Our countermeasure is to use a region automaton \(R(\mathcal {A})\) (Definition 2.5) for representing the set.

We shall first introduce some constructs used in our algorithm.

Definition 5.1

(\(\mathcal {W}(r),\mathcal {W}(\overline{s},\overline{\alpha })\)). Let r be a set of runs of the timed automaton \(\mathcal {A}\). We define a timed language \( \mathcal {W}(r)= \bigl \{\,(\overline{a},\overline{\tau })\,\bigl |\bigr .\,\text {in}~r~\text {there is a run of}\) \(\mathcal {A}~\text {over}~(\overline{a},\overline{\tau })\,\bigr \} \).

For the region automaton \(R(\mathcal {A})\), each run \((\overline{s},\overline{\alpha })\) of \(R(\mathcal {A})\)—where \(s_{k}\in S\) and \(\alpha _{k}\in (\mathbb {R}_{\ge 0})^{C}/{\sim }\), recalling the state space of \(R(\mathcal {A})\) from Definition 2.5—is naturally identified with a set of runs of \(\mathcal {A}\), namely \(\{ (\overline{s},\overline{\nu })\in \bigl (S\times (\mathbb {R}_{\ge 0})^{C}\bigr )^{*} \mid \nu _k \in \alpha _k \text { for each}~k\}\). Under this identification we shall sometimes write \(\mathcal {W}(\overline{s},\overline{\alpha })\) for a suitable timed language, too.

The above definitions of \(\mathcal {W}(r)\) and \(\mathcal {W}(\overline{s},\overline{\alpha })\) naturally extends to a set r of partial runs of \(\mathcal {A}\), and to a partial run \((\overline{s},\overline{\alpha })\) of \(R(\mathcal {A})\), respectively. Here a partial run is a run but we do not require: it start at an initial state; or its initial clock interpretation be 0.

The next optimization of \(R(\mathcal {A})\) is similar to so-called trimming, but we leave those states that do not lead to any final state (they become necessary later).

Definition 5.2

(\(R^{\mathrm {r}}(\mathcal {A})\)). For a timed automaton \(\mathcal {A}\), we let \(R^{\mathrm {r}}(\mathcal {A})\) denote its reachable region automaton. It is the NFA \(R^{\mathrm {r}}(\mathcal {A})=(\varSigma , S^{\mathrm {r}}, S_{0}^{\mathrm {r}}, E^{\mathrm {r}}, F^{\mathrm {r}})\) obtained from \(R(\mathcal {A})\) (Definition 2.5) by removing all those states which are unreachable from any initial state.

We are ready to describe our Boyer-Moore type algorithm. We use a skip value function \(\varDelta _{2}\) that is similar to the one in Sect. 2.3 (see Figs. 3 and 5), computed with the aid of \(m_{s}\) and \(L'_{s}\) defined for each state s. We define \(m_{s}\) and \(L'_{s}\) using the NFA \(R^{\mathrm {r}}(\mathcal {A})\). Notable differences are: (1) here \(L'_{s}\) and \(L'\) are sets of runs, not of words; and (2) since the orders are flipped between Figs. 4 and 6, \( Rev \) e.g. in (6) is gone now.

The precise definitions are as follows. Here \(s \in S\) is a state of the (original) timed automaton \(\mathcal {A}\); and we let \( R^{\mathrm {r}} (s) = \{ (s,\alpha ) \in S^{\mathrm {r}} \} \).
$$\begin{aligned} \begin{aligned} m&= \min \{|w'|\mid w' \in L (\mathcal {A})\} \quad m_s = \min \bigl \{\,|r| \,\bigl |\bigr .\, \beta _0 \in S_{0}^{\mathrm {r}}, \beta \in R^{\mathrm {r}}(s), r \in Runs _{R^{\mathrm {r}} (\mathcal {A})} (\beta _0,\beta )\, \bigr \} \!\!\! \!\!\! \!\!\! \!\!\! \!\!\! \!\!\! \!\!\! \!\!\! \!\!\! \\ L'&= \bigl \{r (0,m-1) \mid \beta _0 \in S_{0}^{\mathrm {r}}, \beta _f \in F^{\mathrm {r}}, r \in Runs _{R^{\mathrm {r}} (\mathcal {A})} (\beta _0,\beta _f)\bigr \} \\ L'_s&= \bigl \{\,r (0,\min \{m,m_s\}-1) \,\bigl |\bigr .\, \beta _0 \in S_{0}^{\mathrm {r}}, \beta \in R^{\mathrm {r}}(s), r \in Runs _{R^{\mathrm {r}}(\mathcal {A})} (\beta _0,\beta )\,\bigr \} \end{aligned} \end{aligned}$$
(7)
Note again that these data are defined over \(R^{\mathrm {r}}(\mathcal {A})\) (Definition 5.2); \( Runs _{R^{\mathrm {r}}(\mathcal {A})} (\beta _0,\beta _f)\) is from Definition 2.6.
Fig. 7.

An example of a timed automaton \(\mathcal {A}\), and the values \(m_{s}, L'_{s}, L'\)

Definition 5.3

(\(\varDelta _{2}\)). Let \( Conf \) be a set of triples \((s,\rho ,T)\) of: a state \(s\in S\) of \(\mathcal {A}\), \(\rho :C\rightharpoonup \mathbb {R}_{>0}\), and an interval T. (This is much like \( Conf (i,j)\) in Definition 4.2.) We define the skip value \(\varDelta _2 ( Conf )\) as follows.
$$\begin{aligned} \begin{array}{rl} d_1 (r) &{}= \min \nolimits _{r' \in L'} \min \{ n \in \mathbb {Z}_{>0} \mid \mathcal {W}(r) \cap \bigl (\,\bigcup \nolimits _{r''\in \mathrm {pref}\bigl (r'(n,|r'|)\bigr )}\mathcal {W}(r'') \,\bigr ) \ne \emptyset \,\}\\ d_2 (r) &{}= \min \nolimits _{r' \in L'} \min \{ n \in \mathbb {Z}_{>0} \mid \bigl (\bigcup \nolimits _{r''\in \mathrm {pref}(r)}\mathcal {W}(r'')\bigr ) \cap \mathcal {W}\bigl (r'(n,|r'|)\bigr ) \ne \emptyset \,\}\\ \varDelta _2 ( Conf ) &{}= \max \nolimits _{(s,\rho ,T) \in Conf } \min \nolimits _{r \in L'_s} \min \{d_1 (r),d_2 (r)\}. \end{array} \end{aligned}$$

Here \(r\in Runs _{R^{\mathrm {r}}(\mathcal {A})}\); \(L'\) is from (7); \(\mathcal {W}\) is from Definition 5.1; \(r'(n,|r'|)\) is a subsequence of \(r'\) (that is a partial run); and \(\mathrm {pref}(r)\) denotes the set of all prefixes of r.

Theorem 5.4

(correctness of \(\varDelta _{2}\) ). Let \(i\in [1,|w|\,]\), and \(j=\max \{j\in [i,|w|\,]\mid Conf (i,j)\ne \emptyset \}\), where \( Conf (i,j)\) is from Definition 4.2. (In case \( Conf (i,j)\) is everywhere empty we let \(j=i\).) Then we have \( \varDelta _2( Conf (i,j))\le Opt (i)\).    \(\square \)

The remaining issue in Definition 5.3 is that the sets like \(\mathcal {W}(r)\) and \(\mathcal {W}(r'')\) can be infinite—we need to avoid their direct computation. We rely on the usual automata-theoretic trick: the intersection of languages is recognized by a product automaton.

Given two timed automata \(\mathcal {B}\) and \(\mathcal {C}\), we let \(\mathcal {B}\times \mathcal {C}\) denote their product defined in the standard way (see e.g. [19]). The following is straightforward.

Proposition 5.5

Let \(r = (\overline{s},\overline{\alpha })\) and \(r' = (\overline{s'},\overline{\alpha '})\) be partial runs of \(R(\mathcal {B})\) and \(R(\mathcal {C})\), respectively; they are naturally identified with sets of partial runs of \(\mathcal {B}\) and \(\mathcal {C}\) (Definition 5.1). Assume further that \(|r| = |r'|\). Then we have \(\mathcal {W} (r) \cap \mathcal {W} (r') = \mathcal {W} (r,r')\), where \((r,r')\) is the following set of runs of \(\mathcal {B}\times \mathcal {C}\): \((r,r') = \bigl \{\, \bigl ((\overline{s},\overline{s'}),(\overline{\nu } , \overline{\nu '})\bigr ) \mid (\overline{s},\overline{\nu }) \in (\overline{s},\overline{\alpha }) \text { and } (\overline{s'},\overline{\nu '}) \in (\overline{s'},\overline{\alpha '}) \,\bigr \}\).    \(\square \)

The proposition allows the following algorithm for the emptiness check required in computing \(d_{1}\) (Definition 5.3). Firstly we distribute \(\cap \) over \(\bigcup \); then we need to check if \(\mathcal {W}(r)\cap \mathcal {W}(r'')\ne \emptyset \) for each \(r''\). The proposition reduces this to checking if \(\mathcal {W}(r,r'')\ne \emptyset \), that is, if \((r,r'')\) is a (legitimate) partial run of the region automaton \(R (\mathcal {A} \times \mathcal {A})\). The last check is obviously decidable since \(R (\mathcal {A} \times \mathcal {A})\) is finite. For \(d_{2}\) the situation is similar.

We also note that the computation of \(\varDelta _{2}\) (Definition 5.3) can be accelerated by memorizing the values \(\min _{r \in L'_s}\min \{d_1 (r),d_2 (r)\}\) for each s.

Finally our Boyer-Moore type algorithm for timed pattern matching is Algorithm 3 in Appendix E in [23]. Its main differences from the naive one (Algorithm 1) are: (1) initially we start with \(i=|w|-m+1\) instead of \(i=|w|\) (line 1 of Algorithm 1); and (2) we decrement i by the skip value computed by \(\varDelta _{2}\), instead of by 1 (line 33 of Algorithm 1).

It is also possible to employ an analog of the skip value function \(\varDelta _1\) in Sects. 2.2 and 2.3. For \(c\in \varSigma \) and \(p\in \mathbb {Z}_{>0}\), we define \( \varDelta _1 (c,p) = \min _{k> 0} \{k - p \mid k > m \text { or } \exists (\overline{a},\overline{\tau }) \in \mathcal {W}(L').\, a_k = c \} \). Here m and \(L'\) are from (7). Then we can possibly skip more i’s using both \(\varDelta _{1}\) and \(\varDelta _{2}\); see Appendix E in [23] for details. In our implementation we do not use \(\varDelta _{1}\), though, following the (untimed) pattern matching algorithm in [24]. Investigating the effect of additionally using \(\varDelta _{1}\) is future work.

One may think of the following alternative for pattern matching: we first forget about time stamps, time constraints, etc.; the resulting “relaxed” untimed problem can be solved by the algorithm in [24] (Sect. 2.3); and then we introduce the time constraints back and refine the interim result to the correct one. Our timed Boyer-Moore algorithm has greater skip values in general, however, because by using region automata \(R(\mathcal {A}),R^{\mathrm {r}}(\mathcal {A})\) we also take time constraints into account when computing skip values.

Remark 5.6

It was suggested by multiple reviewers that our use of region automata be replaced with that of zone automata (see e.g. [4]). This can result in a much smaller automaton \(R(\mathcal {A})\) for calculating skip values (cf. Definition 5.3 and Case 2 of Sect. 6). More importantly, zone automata are insensitive to the time unit size—unlike region automata where the numbers \(c_{x}\) in Definition 2.5 govern their size—a property desired in actual deployment of timed pattern matching. This is a topic of our imminent future work.

6 Experiments

We implemented both of our naive offline algorithm and our Boyer-Moore type algorithm (without \(\varDelta _1\)) in C++ [22]. We ran our experiments on MacBook Air Mid 2011 with Intel Core i7-2677M 1.80 GHz CPU with 3.7 GB RAM and Arch Linux (64-bit). Our programs were compiled with GCC 5.3.0 with optimization level O3.

An execution of our Boyer-Moore type algorithm consists of two phases: in the first pre-processing phase we compute the skip value function \(\varDelta _{2}\)—to be precise the value \(\min _{r \in L'_s}\min \{d_1 (r),d_2 (r)\}\) for each s, on which \(\varDelta _{2}\) relies—and in the latter “matching” phase we actually compute the match set.

As input we used five test cases: each case consists of a timed automaton \(\mathcal {A}\) and multiple timed words w of varying length |w|. Cases 1 and 4 are from a previous work [20] on timed pattern matching; in Case 2 the timed automaton \(\mathcal {A}\) is not expressible with a timed regular expression (TRE, see [12] and Sect. 2.1); and Cases 3 and 5 are our original. In particular Case 5 comes from an automotive example.

Our principal interest is in the relationship between execution time and the length |w| (i.e. the number of events), for both of the naive and Boyer-Moore algorithms. For each test case we ran our programs 30 times; the presented execution time is the average. We measured the execution time separately for the pre-processing phase and the (main) matching phase; in the figures we present the time for the latter.

We present an overview of the results. Further details are in Appendix G in [23].

Fig. 8.

Case 1: \(\mathcal {A}\) and execution time

Fig. 9.

Case 3: \(\mathcal {A}\) and execution time

Fig. 10.

Case 2: \(\mathcal {A}\) and execution time

Fig. 11.

Case 5: \(\mathcal {A}\) and execution time

Case 1: No Clock Constraints. In Fig. 8 we present a timed automaton \(\mathcal {A}\) and the execution time (excluding pre-processing) for 37 timed words w whose lengths range from 20 to 1,024,000. Each timed word w is an alternation of \(a,b\in \varSigma \), and its time stamps are randomly generated according to a certain uniform distribution.

The automaton \(\mathcal {A}\) is without any clock constraints, so in this case the problem is almost that of untimed pattern matching. The Boyer-Moore algorithm outperforms the naive one; but the gap is approximately 1/10 when |w| is large enough, which is smaller than what one would expect from the fact that i is always decremented by 2. This is because, as some combinatorial investigation would reveal, those i’s which are skipped are for which we examine fewer j’s

The pre-processing phase (that relies only on \(\mathcal {A}\)) took \(6.38 \cdot 10^{-2}\) ms. on average.

Case 2: Beyond Expressivity of TREs. In Fig. 10 are a timed automaton \(\mathcal {A}\)—one that is not expressible with TREs [12]—and the execution time for 20 timed words w whose lengths range from 20 to 10,240. Each w is a repetition of \(a \in \varSigma \), and its time stamps are randomly generated according to the uniform distribution in the interval (0, 0.1).

One can easily see that the skip value is always 1, so our Boyer-Moore algorithm is slightly slower due to the overhead of repeatedly reading the result of pre-processing. The naive algorithm (and hence the Boyer-Moore one too) exhibits non-linear increase in Fig. 10; this is because its worst-case complexity is bounded by \(|w||E|^{{|w|}+1}\) (where |E| is the number of edges in \(\mathcal {A}\)). See the proof of Theorem 4.3 (Appendix F.1 in [23]). The factor |E| in the above complexity bound stems essentially from nondeterminism.

The pre-processing phase took \(1.39 \cdot 10^{2}\) ms. on average.

Case 3: Accepting Runs are Long. In Fig. 9 are a timed automaton \(\mathcal {A}\) and the execution time for 49 timed words w whose lengths range from 8,028 to 10,243,600. Each w is randomly generated as follows: it is a repetition of \(a\in \varSigma \); a is repeated according to the exponential distribution with a parameter \(\lambda \); and we do so for a fixed duration \(\tau _{|\overline{\tau }|}\), generating a timed word of length |w|. See Table 3 in Appendix G in [23].

In the automaton \(\mathcal {A}\) the length m of the shortest accepting run is large; hence so are the skip values in the Boyer-Moore optimization. (Specifically the skip value is 5 if both \(\tau _i - \tau _{i-1}\) and \(\tau _{i+1} - \tau _{i}\) are greater than 1.) Indeed, as we see from the figure, the Boyer-Moore algorithm outperforms the naive one roughly by twice.

The pre-processing phase took 7.02 ms. on average. This is in practice negligible; recall that pre-processing is done only once when \(\mathcal {A}\) is given.

Case 4: Region Automata are Big. Here \(\mathcal {A}\) is a translation of the TRE \(\bigl \langle \bigl (\,\bigl (\langle \mathrm {p}\rangle _{(0,10]}\langle \mathrm {\lnot p}\rangle _{(0,10]}\bigr )^*\wedge \bigl (\langle \mathrm {q}\rangle _{(0,10]}\langle \mathrm {\lnot q}\rangle _{(0,10]}\bigr )^*\,\bigr )\$\bigr \rangle _{(0,80]}\). We executed our two algorithms for 12 timed words w whose lengths range from 1,934 to 31,935. Each w is generated randomly as follows: it is the interleaving combination of an alternation of \(p,\lnot p\) and one of \(q,\lnot q\); in each alternation the time stamps are governed by the exponential distribution with a parameter \(\lambda \); and its duration \(\tau _{|\tau |}\) is fixed.

This \(\mathcal {A}\) is bad for our Boyer-Moore type algorithm since its region automaton \(R(\mathcal {A})\) is very big. Specifically: the numbers \(c_{x}\) in Definition 2.5 are big (10 and 80) and we have to have many states accordingly in \(R(\mathcal {A})\)—recall that in \(\sim \) we care about the coincidence of integer part. Indeed, the construction of \(R^{\mathrm {r}} (\mathcal {A})\) took ca. 74s., and the construction of \(R(\mathcal {A} \times \mathcal {A})\) did not complete due to RAM shortage. Therefore we couldn’t complete pre-processing for Boyer-Moore. We note however that our naive algorithm worked fine. See Table 4 in Appendix G in [23].

Case 5: An Automotive Example. This final example (Fig. 11) is about anomaly detection of engines. The execution time is shown for 10 timed words w whose lengths range from 242,808 to 4,873,207. Each w is obtained as a discretized log of the simulation of the model sldemo_enginewc.slx in the Simulink Demo palette [16]: here the input of the model (desired rpm) is generated randomly according to the Gaussian distribution with \(\mu =\text {2,000}\) rpm and \(\sigma ^{2}=10^{6}\) \(\text {rpm}^{2}\); we discretized the output of the model (engine torque) into two statuses, \( high \) and \( low \), with the threshold of \(40\mathrm {N \cdot m}\).

This test case is meant to be a practical example in automotive applications—our original motivation for the current work. The automaton \(\mathcal {A}\) expresses: the engine torque is \( high \) for more than 1 s (the kind of anomaly we are interested in) and the log is not too sparse (which means the log is a credible one).

Here the Boyer-Moore algorithm outperforms the naive one roughly by twice. The pre-processing phase took 9.94 ms on average.

Lacking in the current section are: detailed comparison with the existing implementations (e.g. in [20], modulo the word-signal difference in Remark 2.2); and performance analysis when the specification \(\mathcal {A}\), instead of the input timed word w, grows. We intend to address these issues in the coming extended version.

Footnotes

  1. 1.

    For example, in [6], a payment transaction record of 300 K users over almost a year is monitored—against various properties, some of them timed and others not—and they report the task took hundreds of hours.

Notes

Acknowledgments

Thanks are due to the anonymous referees for their careful reading and expert comments. The authors are supported by Grant-in-Aid No. 15KT0012, JSPS; T.A. is supported by Grant-in-Aid for JSPS Fellows.

References

  1. 1.
    Alur, R., Dill, D.L.: A theory of timed automata. Theor. Comput. Sci. 126(2), 183–235 (1994)MathSciNetCrossRefMATHGoogle Scholar
  2. 2.
    Asarin, E., Caspi, P., Maler, O.: A Kleene theorem for timed automata. In: Proceedings of the LICS 1997, pp. 160–171. IEEE Computer Society (1997)Google Scholar
  3. 3.
    Asarin, E., Caspi, P., Maler, O.: Timed regular expressions. J. ACM 49(2), 172–206 (2002)MathSciNetCrossRefMATHGoogle Scholar
  4. 4.
    Behrmann, G., Bouyer, P., Larsen, K.G., Pelánek, R.: Lower and upper bounds in zone-based abstractions of timed automata. STTT 8(3), 204–215 (2006)CrossRefMATHGoogle Scholar
  5. 5.
    Boyer, R.S., Moore, J.S.: A fast string searching algorithm. Commun. ACM 20(10), 762–772 (1977)CrossRefMATHGoogle Scholar
  6. 6.
    Colombo, C., Pace, G.J.: Fast-forward runtime monitoring — an industrial case study. In: Qadeer, S., Tasiran, S. (eds.) RV 2012. LNCS, vol. 7687, pp. 214–228. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  7. 7.
    Deshmukh, J.V., Donzé, A., Ghosh, S., Jin, X., Juniwal, G., Seshia, S.A.: Robust online monitoring of signal temporal logic. In: Bartocci, E., Majumdar, R. (eds.) RV 2015. LNCS, vol. 9333, pp. 55–70. Springer, Heidelberg (2015). doi: 10.1007/978-3-319-23820-3_4 CrossRefGoogle Scholar
  8. 8.
    Dokhanchi, A., Hoxha, B., Fainekos, G.: On-line monitoring for temporal logic robustness. In: Bonakdarpour, B., Smolka, S.A. (eds.) RV 2014. LNCS, vol. 8734, pp. 231–246. Springer, Heidelberg (2014)Google Scholar
  9. 9.
    Donzé, A., Ferrère, T., Maler, O.: Efficient robust monitoring for STL. In: Sharygina, N., Veith, H. (eds.) CAV 2013. LNCS, vol. 8044, pp. 264–279. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  10. 10.
    Ferrère, T., Maler, O., Ničković, D., Ulus, D.: Measuring with timed patterns. In: Kroening, D., Păsăreanu, C.S. (eds.) CAV 2015. LNCS, vol. 9207, pp. 322–337. Springer, Heidelberg (2015)CrossRefGoogle Scholar
  11. 11.
    Geist, J., Rozier, K.Y., Schumann, J.: Runtime observer pairs and bayesian network reasoners on-board FPGAs: flight-certifiable system health management for embedded systems. In: Bonakdarpour, B., Smolka, S.A. (eds.) RV 2014. LNCS, vol. 8734, pp. 215–230. Springer, Heidelberg (2014)Google Scholar
  12. 12.
    Herrmann, P.: Renaming is necessary in timed regular expressions. In: Pandu Rangan, C., Raman, V., Sarukkai, S. (eds.) FST TCS 1999. LNCS, vol. 1738, pp. 47–59. Springer, Heidelberg (1999)CrossRefGoogle Scholar
  13. 13.
    Ho, H.-M., Ouaknine, J., Worrell, J.: Online monitoring of metric temporal logic. In: Bonakdarpour, B., Smolka, S.A. (eds.) RV 2014. LNCS, vol. 8734, pp. 178–192. Springer, Heidelberg (2014)Google Scholar
  14. 14.
    Kane, A., Chowdhury, O., Datta, A., Koopman, P.: A case study on runtime monitoring of an autonomous research vehicle (ARV) system. In: Bartocci, E., Majumdar, R. (eds.) RV 2015. LNCS, vol. 9333, pp. 102–117. Springer, Heidelberg (2015). doi: 10.1007/978-3-319-23820-3_7 CrossRefGoogle Scholar
  15. 15.
    Knuth, D.E., Morris Jr., J.H., Pratt, V.R.: Fast pattern matching in strings. SIAM J. Comput. 6(2), 323–350 (1977)MathSciNetCrossRefMATHGoogle Scholar
  16. 16.
    Simulink User’s Guide. The MathWorks Inc., Natick (2015)Google Scholar
  17. 17.
  18. 18.
    Ouaknine, J., Worrell, J.: On the decidability and complexity of metric temporal logic over finite words. Logical Meth. Comput. Sci. 3(1), 1–27 (2007)MathSciNetCrossRefMATHGoogle Scholar
  19. 19.
    Pandya, P.K., Suman, P.V.: An introduction to timed automata. In: Modern Applications of Automata Theory, pp. 111–148. World Scientific (2012)Google Scholar
  20. 20.
    Ulus, D., Ferrère, T., Asarin, E., Maler, O.: Timed pattern matching. In: Legay, A., Bozga, M. (eds.) FORMATS 2014. LNCS, vol. 8711, pp. 222–236. Springer, Heidelberg (2014)Google Scholar
  21. 21.
    Ulus, D., Ferrère, T., Asarin, E., Maler, O.: Online timed pattern matching using derivatives. In: Chechik, M., Raskin, J.-F. (eds.) TACAS 2016. LNCS, vol. 9636, pp. 736–751. Springer, Heidelberg (2016). doi: 10.1007/978-3-662-49674-9_47 CrossRefGoogle Scholar
  22. 22.
    Waga, M., Akazaki, T., Hasuo, I.: Code that Accompanies “A Boyer-Moore Type Algorithm for TimedPattern Matching”. https://github.com/MasWag/timed-pattern-matching
  23. 23.
    Waga, M., Akazaki, T., Hasuo, I.: A Boyer-Moore Type Algorithm for Timed Pattern Matching (2016). CoRR, abs/1606.07207Google Scholar
  24. 24.
    Watson, B.W., Watson, R.E.: A Boyer-Moore-style algorithm for regular expression pattern matching. Sci. Comput. Program. 48(2–3), 99–117 (2003)CrossRefMATHGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  1. 1.University of TokyoTokyoJapan
  2. 2.JSPS Research FellowTokyoJapan

Personalised recommendations