Expressive Stream Reasoning with Laser
 8 Citations
 2.3k Downloads
Abstract
An increasing number of use cases require a timely extraction of nontrivial knowledge from semantically annotated data streams, especially on the Web and for the Internet of Things (IoT). Often, this extraction requires expressive reasoning, which is challenging to compute on large streams. We propose Laser, a new reasoner that supports a pragmatic, nontrivial fragment of the logic LARS which extends Answer Set Programming (ASP) for streams. At its core, Laser implements a novel evaluation procedure which annotates formulae to avoid the recomputation of duplicates at multiple time points. This procedure, combined with a judicious implementation of the LARS operators, is responsible for significantly better runtimes than the ones of other stateoftheart systems like CSPARQL and CQELS, or an implementation of LARS which runs on the ASP solver Clingo. This enables the application of expressive logicbased reasoning to large streams and opens the door to a wider range of stream reasoning use cases.
1 Introduction
The Web and the emerging Internet of Things (IoT) are highly dynamic environments where streams of data are valuable sources of knowledge for many use cases, like traffic monitoring, crowd control, security, or autonomous vehicle control. In this context, reasoning can be applied to extract implicit knowledge from the stream. For instance, reasoning can be applied to detect anomalies in the flow of information, and provide clear explanations that can guide a prompt understanding of the situation.
Problem. Reasoning on data streams should be done in a timely manner [11, 21]. This task is challenging for several reasons: First, expressive reasoning that supports features for a finegrained control of temporal information may come with an unfavourable computational complexity. This clashes with the requirement of a reactive system that shall work in a highly dynamic environment. Second, the continuous flow of incoming data calls for incremental evaluation techniques that go beyond repeated querying and recomputation. Third, there is no consensus on the formal semantics for the processing of streams which hinders a meaningful and fair comparison between stream reasoners.
Despite recent substantial progress in the development of stream reasoners, to the best of our knowledge there is still no reasoning system that addresses all three challenges. Some systems can handle large streams but do not support expressive temporal reasoning features [3, 5, 17, 19]. Other approaches focus on the formal semantics but do not provide implementations [14]. Finally, some systems implemented only a particular rule set and cannot be easily generalized [16, 27].
Contribution. We tackle the above challenges with the following contributions.

We present Laser, a novel stream reasoning system based the recent rulebased framework LARS [9], which extends Answer Set Programming (ASP) for stream reasoning. Programs are sets of rules which are constructed on formulae that contain window operators and temporal operators. Thereby, Laser has a fully declarative semantics amenable for formal comparison.

To address the tradeoff between expressiveness and data throughput, we employ a tractable fragment of LARS that ensures uniqueness of models. Thus, in addition to typical operators and window functions, Laser also supports operators such as \(\Box \), which enforces the validity over intervals of time points, and @, which is useful to state or retrieve specific time points at which atoms hold.

We provide a novel evaluation technique which annotates formulae with two time markers. When a grounding of a formula \(\varphi \) is derived, it is annotated with an interval [c, h] from a consideration time c to a horizon time h, during which \(\varphi \) is guaranteed to hold. By efficiently propagating and removing these annotations, we obtain an incremental model update that may avoid many unnecessary recomputations. Also, these annotations enable us to implement a technique similar to the SemiNaive Evaluation (SNE) of Datalog programs [1] to reduce duplicate derivations.

We present an empirical comparison of the performance of Laser against the stateoftheart engines, i.e., CSPARQL [5] and CQELS [19] using microbenchmarks and a more complex program. We also compare Laser with an open source implementation of LARS which is based on the ASP solver Clingo to test operators not supported by the other engines.
Our empirical results are encouraging as they show that Laser outperforms the other systems, especially with large windows where our incremental approach is beneficial. This allows the application of expressive logicbased reasoning to large streams and to a wider range of use cases. To the best of our knowledge, no comparable stream reasoning system that combines similar expressiveness with efficient computation exists to date. See [7] for an extended version of this paper.
2 Theoretical Background: LARS
As formal foundation, we use the logicbased framework LARS [9]. We focus on a pragmatic fragment called Plain LARS first mentioned in [8]. We assume the reader is familiar with basic notions, in particular those of logic programming. Throughout, we distinguish extensional atoms \(\mathcal {A}^\mathcal {E}\) for input and intensional atoms \(\mathcal {A}^\mathcal {I}\) for derivations. By \(\mathcal {A}= \mathcal {A}^\mathcal {E}\cup \mathcal {A}^\mathcal {I}\), we denote the set of atoms. Basic arithmetic operations and comparisons are assumed to be given in form of designated extensional predicates, but written with infix notation as usual. We use upper case letters X, Y, Z to denote variables, lower case letters \(x,y,\ldots \) are for constants, and p, a, b, q for predicates for atoms.
Definition 1
(Stream). A stream \({S=(T,v)}\) consists of a timeline T, which is a closed interval in \(\mathbb {N}\), and an evaluation function \({v: \mathbb {N}\mapsto 2^\mathcal {A}}\). The elements \({t \in T}\) are called time points.
Intuitively, a stream S associates with each time point a set of atoms. We call S a data stream, if it contains only extensional atoms. To cope with the amount of data, one usually considers only recent atoms. Let \({S=(T,v)}\) and \({S'=(T',v')}\) be two streams s.t. \({S' \subseteq S}\), i.e., \({T' \subseteq T}\) and \({v'(t') \subseteq v(t')}\) for all \({t' \in T'}\). Then \(S'\) is called a window of S.
Definition 2
(Window function). Any (computable) function w that returns, given a stream \(S=(T,v)\) and a time point \({t \in \mathbb {N}}\), a window \(S'\) of S, is called a window function.
In this work, we focus on two prominent sliding windows that select recent atoms based on time, respectively counting. A sliding timebased window selects all atoms appearing in the last n time points.
Definition 3
(Sliding Timebased Window). Let \({S=(T,v)}\) be a stream, \({t \in T=[t_1,t_2]}\) and let \({n \in \mathbb {N}}\), \(n \ge 0\). Then the sliding timebased window function \(\tau _n\) (for size n ) is \({ \tau _n(S,t) = (T',v_{T'})}\), where \({T'=[t',t]}\) and \({t' = \max \{t_1,tn\}}\).
Similarly, a sliding tuplebased window selects the last n tuples. We define the tuple size S of stream \(S=(T,v)\) as \(\{ (a,t) \mid t \in T, a \in v(t)\}\).
Definition 4
We refer to these windows simply by time windows, resp. tuple windows. Note that for time windows, we allow size \(n=0\), which selects all atoms at the current time point, while the tuple window must select at least one atom, hence \(n \ge 1\).
Note that we associate with each time point a set of atoms. Thus, for the tuplebased window, if \([t',t]\) is the smallest timeline in which n atoms are found, then in general one might have to delete arbitrary atoms at time point \(t'\) such that exactly n remain \([t',t]\).
Example 1
Consider a data stream \({D=(T,v_D)}\) as shown in Fig. 1, where \({T=[35,42]}\) and \(v_D=\{ 36 \mapsto \{a(x_1,y)\}, 38 \mapsto \{a(x_2,y),b(y,z)\}, 40 \mapsto \{a(x_3,y)\}\}\). The indicated time window of size 3 has timeline [38, 41] and only contains the last three atoms. Thus, the window is also the tuple window of size 3 at 40. Notably, [38, 41] is also the temporal extent of the tuple window of size 2, for which there are two options, dropping either \(a(x_2,y)\) or b(y, z) at time 38.
Although Definition 4 introduces nondeterminism, one may assume a deterministic function based on the implementation at hand. Here, we assume data is arriving in a strict order from which a natural deterministic tuple window follows.
Window operators \(\boxplus ^w\) . A window function w can be accessed in rules by window operators. That is to say, an expression \(\boxplus ^w \alpha \) has the effect that \(\alpha \) is evaluated on the “snapshot” of the data stream delivered by its associated window function w. Within the selected snapshot, LARS allows to control the temporal semantics with further modalities, as will be explained below.
2.1 Plain LARS Programs
Plain LARS programs as in [8] extend normal logic programs. We restrict here to positive programs, i.e., without negation.
A rule r is of the form \(\alpha \leftarrow \beta _1,\dots ,\beta _n\), where \( H (r)=\alpha \) is the head and \( B (r)=\{\beta _1,\dots ,\beta _n\}\) is the body of r. The head \(\alpha \) is of form a or \(@_t a\), where \({a \in \mathcal {A}^\mathcal {I}}\), and each \(\beta _i\) is an extended atom. A (positive plain) program P is a set of rules. We say an extended atom \(\beta \) occurs in a program P if \({\beta \in \{ H (r) \} \cup B (r)}\) for some rule \({r\in P}\).
Example 2
(cont’d). The rule \(r = q(X,Y,Z) \leftarrow \boxplus ^{3} \Diamond a(X,Y), \boxplus ^{\# 3} \Diamond b(Y,Z)\) expresses a query with a join over predicates a and b in the standard snapshot semantics: If for some variable substitutions for X, Y, Z, a(X, Y) holds some time during the last 3 time points and b(Y, Z) at some time point in the window of the last 3 tuples, then q(X, Y, Z) is must be inferred.
We identify rules \(\alpha \leftarrow \beta _1,\dots ,\beta _n\) with implications \(\beta _1 \wedge \dots \wedge \beta _n \rightarrow \alpha \), thus obtaining by them and their subexpressions the set \(\mathcal {F}\) of formulae.
Definition 5
(Answer Stream). An interpretation stream I is an answer stream of program P for the data stream \({D \subseteq I}\) at time t, if \({M=\langle I,W,\mathcal {B}\rangle }\) is a minimal model of the reduct \(P^{M,t}=\{r \in P \mid M,t \,\models \, B (r)\}\).
Note that using tuple windows over intensional data seems neither useful nor intuitive. For instance, program \(P=\{a \leftarrow \boxplus ^{\# 1} \Diamond b\}\) is inconsistent for a data stream D at time t, where the last atom is b, occurring at time \(t1\): by deriving a for time t, suddenly a would be the last tuple.
Proposition 1
Let P be a positive plain LARS program that employs only time windows, and tuple window operators only over extensional atoms. Then, P always has a unique answer stream.
Nonground programs. We obtain the semantics for nonground programs in a straightforward way by considering rules with variables as schematic descriptions of respective ground instantiations. Substitutions \(\sigma \) are defined as usual.
Example 3
At time \(t=41\), the time window \(\boxplus ^{3}\) and the tuple window \(\boxplus ^{\# 3}\) are identical, as indicated in Fig. 1, and contain atoms \(a(x_2,y)\), b(y, z), and \(a(x_3,y)\). Consider rule \(r_1\). Window atom \(\boxplus ^{3}\Diamond a(x_1,y)\) does no hold, since there is not a time point t in the selected window such that \(a(x_1,y)\) holds at t. However, the remaining window atoms in P all holds, hence the body of rules \(r_2\) and \(r_3\) hold. Thus, a model of P (for D at time 41) must include \(q(x_2,y,z)\) and \(q(x_3,y,z)\). We obtain the answer stream \(D \cup (T,\{ 41 \mapsto \{q(x_2,y,z),q(x_3,y,z)\}\}\)).
Definition 6
(Output). Let \(I=(T,v)\) be the answer stream of program P (for a data stream D) at time t. Then, the output (of P for D at t) is defined by \(v(t) \cap \mathcal {A}^\mathcal {I}\), i.e., the intensional atoms that hold at t.
Given a data stream \(D=(T,v)\), where \(T=[t_1,t_n]\), we obtain an output stream \(S=(T,v)\) by the output at consecutive outputs, i.e., for each \(t' \in T\), \(v(t')\) is the output for \((T',v_{T'})\), where \(T'=[t_1,t']\). Thus, an output stream is the formal description of the sequence of temporary valid derivations based on a sequence of answer streams over a timeline. Our goal is to compute it efficiently.
Example 4
(cont’d). Continuing Example 3, the output of P for D at 41 is \(\{q(x_2,y,z),q(x_3,y,z)\}\). The output stream \(S=(T,v)\) is given by \(v = \{ t \mapsto \{ q(x_1,y,z), q(x_2,y,z) \mid t=38,39\} \cup \{ t \mapsto \{q(x_2,y,z), q(x_3,y,z)\} \mid t=40,41,42\}\).
3 Incremental Evaluation of LARS Programs
In this section, we describe the efficient output stream computation of Laser. The incremental procedure consists in continuously grounding and then annotating formulae with two time points that indicate when and for how long formulae hold. We thus address two important sources of inefficiency: grounding (including time variables) and model computation.
Our work deliberately focuses on exploiting purely sliding windows. The longer a (potential) step size [9], the less incremental reasoning can be applied. In the extreme case of a tumbling window (i.e., where the window size equals the step size) there is nothing that can be evaluated incrementally. However, as long as the two subsequent windows share some data, the incremental algorithm can be beneficial. We now give the intuition of our approach in an Example.
Example 5
(cont’d). Consider again the stream of Fig. 1, and assume that we are at \(t=36\), where \(a(x_1,y)\) appears as first atom in the stream. In rule \(r=q(X,Y,Z) \leftarrow \boxplus ^{3}\Diamond a(X,Y), \boxplus ^{\# 3} \Diamond b(Y,Z)\), the atom matches the window atom \(\alpha = \boxplus ^{3} \Diamond a(X,Y)\), and we obtain a substitution \(\sigma = \{X \mapsto x_1, Y \mapsto y\}\) under which a(X, Y) holds at time 36. However, for \(\alpha \), we can use \(\sigma \) for the next 3 time points due to the size of the window and operator \(\Diamond \). That is, we start considering \(\sigma \) at time 36 and we have a guarantee that the grounding \(\alpha \sigma \) (written postfix) holds until time point 39, which we call the horizon time. We thus write \(\alpha \sigma _{[36,39]}\) for the annotated ground formula, which states that \(\boxplus ^{3}\Diamond a(x_1,y)\) holds at all evaluation \(t \in [36,39]\), i.e., at \(t \in [37,39]\), the neither the grounding nor the truth of \(\boxplus ^{3}\Diamond a(x_1,y)\) needs to be rederived.
Definition 7
Let \({\alpha \in \mathcal {F}}\) be a formula, and \({c,h \in \mathbb {N}}\) such that \({c \le h}\), and \(\sigma \) a substitution. Then, \({\alpha \sigma }\) denotes the formula which replaces variables in \(\alpha \) by constants due to \(\sigma \); \(\alpha \sigma _{[c,h]}\) is called an annotated formula, c is called the consideration time and h the horizon time, and the interval [c, h] the annotation.
As illustrated in Example 5, the intended meaning of an annotated formula \(\alpha \sigma _{[c,h]}\) is that formula \(\alpha \sigma \) holds throughout the interval [c, h]. Annotations might overlap.
Example 6
Consider an atom a(y) streams at time points 5 and 8. Then, for the formula \(\alpha = \boxplus ^{9} \Diamond a(X)\), we get the substitution \(\sigma =\{X \mapsto y\}\) and an annotation \(a_1=[5,14]\) at \(t=5\), and then \(a_2=[8,17]\) at \(t=8\). That is to say, \(\alpha \sigma = \boxplus ^{9} \Diamond a(y)\) holds at all time points [5, 14] due to annotation \(a_1\) and at time points [8, 17] due to \(a_2\), and for each \(t \in [8,14]\) it suffices to retrieve one of these annotations to conclude that \(\alpha \sigma \) holds at t.
We note that the tuple window can be processed dually by additionally introducing a consideration count \(c_\#\) and a horizon count \(h_\#\), i.e., an annotated formula \(\alpha \sigma _{[c_\#,h_\#]}\) would indicate that \(\alpha \sigma \) holds when the number of atoms received so far is between \(c_\#\) and \(h_\#\). In essence, the following mechanisms work analogously for time and tuplebased annotations. We thus limit our presentation to the timebased case for the sake of simplification.
Algorithm 1. We report in Algorithm 1 the main reasoning algorithm performed by the system. Sets \(I_1,\ldots ,I_n\) contain the annotated formulae at times \(t_1,\ldots ,t_n\); \(S_0,I_0\) are convenience sets necessary for the very first iteration. At the beginning of each time point \(t_i\) we first collect in line 3 all facts from the input stream. Each atom \(a \in v(t_i)\) is annotated with \([t_i,t_i]\), i.e., its validity will expire to hold already at the next time point. In line 4, we expire previous conclusions based on horizon times, i.e., among annotated intensional atoms \(a_{[c,h]}\) only those are retained where \(t_i \le h\). Note that we do not delete atoms from the data stream.
In lines 5–14, the algorithm performs a fixedpoint computation as usual where all rules are executed until nothing else can be derived (line 13). Lines 8–11 describe the physical execution of the rules and the materialization of the new derivations. First, line 8 collects all annotated groundings for extended atoms from the body of the considered rule. We discuss the details of this underlying function \( grd \) later (see Algorithm 2). In line 9 we then consider any substitution for the body that currently holds \((c_1,\dots ,c_n \le t_i)\). In order to produce a new derivation, we additionally require at least one formula was not considered in previous time points (\(\bigvee _{j= 1}^n (c_j = t_i)\)).
The last condition implements a weak version of SNE, which we call \( sSNE \). In fact, it only avoids duplicates between time points and not within the same time point. In order to capture also this last source of duplicates, we would need to add an additional internal counter to track multiple executions of the same rule. We decided not to implement this to limit the space overhead.
Matching substitutions in line 9 then are assigned to the head, where variables which are not used can be dropped as usual. Notice that consideration/horizon time for the ground head atom is given by intersection of all consideration/horizon times of the body atoms, i.e., the guarantee for the derivations is in the longest interval for which the body is guaranteed to hold. If the head is of form \(@_U \alpha \) and holds now, i.e., at \(t_i\), we also add an entry for \(\alpha \) to \(I_i\) (line 11). After the fixedpoint computation has terminated (line 13), we can either stream the output at \(t_i\), i.e., \(v'(t_i)\) (line 15), or store it for a later output of the answer stream S after processing the entire timeline (line 17).
Example 7
(cont’d). As in Example 6, assume \(\alpha = \boxplus ^{9}\Diamond a(X)\) and input atom a(y) at time 5. Towards annotated groundings of \(\alpha \), we first obtain the substitution \(\sigma =\{X \mapsto y\}\) which can be guaranteed only for time point \(c=5\) for atom a(X), i.e., \(a(X)\sigma _{[5,5]}=a(y)_{[5,5]}\). Based on this, we eventually want to compute \(\alpha \sigma _{[5,14]}\). This is done in two steps. First, the subformula \(\beta = \Diamond a(X)\) is agnostic about the timeline, and its grounding \(\beta \sigma \) gets an annotation \([5,\infty ]\). The intuition behind setting the horizon time to \(\infty \) at this point is that \(\Diamond \beta \) will always hold as soon \(\beta \) holds once. The restriction to a specific timeline is then carried out when \(\beta \sigma _{[5,\infty ]}\) is handled in case \(\boxplus ^{9}\beta \), which limits the horizon to \(\min (c+n,\infty )=14\); any horizon time h received for \(\beta \) that is smaller than 14 would remain.
Thus, the conceptual approach of Algorithm 2 is to obtain the intervals when a subformula holds and adjust the temporal guarantee either by extending or restricting the annotation. Since the operator \(\Box \) evaluates intervals, we have to include the boundaries of the window. That is, if a formula \(\boxplus ^{n} \Box p({\mathbf {x}})\) must be grounded, we call \( grd \) in Algorithm 1 for the entire timeline \([t_1,t_i]\), where \(t_i\) is the current evaluation time. Thus, we get \(t_b=t_1, t_e=t_i\) initially. However, in order for \(\Box p({\mathbf {x}})\) to hold under a substitution \(\sigma \) within the window of the last n time points, \(p({\mathbf {x}}) \sigma \) must hold at every time point \([tn,t]\). Thus, the recursive call for \(\boxplus ^n \beta \) limits the timeline to \([t_en,t_e]\). Then, the case \(\Box \beta \) seeks to find a sequence of ordered, overlapping annotations \([c_1,h_1],\dots ,[c_n,h_n]\) that subsumes the considered interval \([t_b,t_e]\). In this case, \(\Box \beta \) holds at \(t_e\), but it cannot be guaranteed to hold longer. Thus, when \(\alpha \sigma _{[t_e,t_e]}\) is returned to the case for \(\boxplus ^n\), the horizon time will not be extended.
Example 8
(cont’d). Consider \(\alpha ' = \boxplus ^{2}\Box a(X)\). Assume that in the timeline [0, 7] at time points \(t=5,6,7\), we received the input a(y), hence \(\boxplus ^{2}\Box a(y)\) has to hold at \(t=7\). When we call (in Algorithm 1) \( grd (\alpha ',I,0,7)\), where \(I=\{ a(y)_{[5,5]}, a(y)_{[6,6]}, a(y)_{[7,7]}\}\), the case for \(\boxplus ^{2}\beta \) will call \( grd (\Box a(X),I,5,7)\). The sequence of groundings as listed in I subsumes [5, 7], i.e., the scope given by \(t_b=5\) and \(t_e=7\), and thus the case for \(\Box \) returns \(\Box a(y)_{[7,7]}\). The annotation remains for \(\alpha '\), i.e., \( grd (\alpha ',I,0,7)=\{ \boxplus ^{2} \Box a(y)_{[7,7]}\}\). Note when at time 8 atom a(y) does not hold, neither does \(\boxplus ^{2}\Box a(y)\). Hence, in contrast to \(\Diamond \), the horizon time is not extended for \(\Box \).
With respect to the temporal aspect, the case for @ works similarly as the one for \(\Diamond \), since both operators amount to existential quantification within the timeline. In addition \(\Diamond \), the @operator also includes in the time point substitution \(U \mapsto u\) where the subformula \(\beta \) holds (line 12). In Line 11, we additionally take from I the explicit derivations for @atoms derived so far.
Proposition 2
For every data stream D and program P, Algorithm 1 terminates.
Theorem 1
Let P be a positive plain LARS program, D be a data stream with timeline \(T=[t_1,t_n]\). Then, S is the output stream of P for D iff \(S= Eval (D,P)\).
Tuplebased windows. As noted earlier, our annotationbased approach based on consideration time c and horizon time h works analogously from the tuplebased window by additionally working with a consideration count \(c_{\#}\) and a horizon count \(h_\#\) for every ground formula. Each formula can then hold and expire in only one of these dimensions, or both of them at the same time.
Example 9
Consider again rule r from Example 5. When b(y, z) streams in at time 38 as third atom, we obtain an annotated ground formula \(\boxplus ^{\# 3} \Diamond b(y,z)_{[3\#,5\#]}\). That is, when the fourth and fifth atoms stream in, regardless at which time points, \(\boxplus ^{\# 3} \Diamond b(y,z)\) is still guaranteed to hold.
Adding negation. Notably, our approach can be extended for handling negation as well. In plain LARS as defined in [8], extended atoms \(\beta \) from rule bodies may occur under negation. We can, however, instead assume negation to occur directly in front of atoms: Due to the FLPsemantics [13] of LARS [9], where “\({{\mathrm{not}}}\)” can be identified with \(\lnot \), we get the following equivalences for both \(w \in \{\tau _n, \#_n \}\): \({\lnot \boxplus ^w \Diamond a({\mathbf {x}})} \equiv {\boxplus ^w \Box \lnot a({\mathbf {x}})}\) and \({\lnot \boxplus ^w \Box a({\mathbf {x}})} \equiv {\boxplus ^w \Diamond \lnot a({\mathbf {x}})}\). The case is more subtle for @, since \(@_t \lnot a({\mathbf {x}})\) implies that \(a({\mathbf {x}})\) is false. However, due to the definition of @, \(\lnot @_t a({\mathbf {x}})\) can also hold if t is not contained in the considered timeline. Thus, the equivalence \({\lnot \boxplus ^w @_t a({\mathbf {x}})} \equiv {\boxplus ^w @_t \lnot a({\mathbf {x}})}\) (necessarily) holds only if the timeline contains t. This assumption is safe when we assume that the timeline always covers all considered time points.
Our approach extends naturally to a variant of plain LARS where negation appears only in front of atoms: In addition to the base case \(p({\mathbf {x}})\) in Line 2 in Algorithm 2 we must add a case for a negative literal \(\ell =\lnot p({\mathbf {x}})\). Using standard conventions, we then have to consider all possible substitutions \(\sigma \) for variables in \({\mathbf {x}}\) that occur positively in the same rule r, such that \(p({\mathbf {x}})\sigma \) does not hold.
We obtain a fragment that is significantly more expressive, but results in having multiple answer streams in general: note that plain LARS essentially subsumes normal logic programs, and the program \(a \leftarrow {{\mathrm{not}}}\, b;~ b \leftarrow {{\mathrm{not}}}\, a\) has two answer sets \(\{a\}\) and \(\{b\}\). Analogously, we get multiple answer streams by allowing such loops through negation. To retain both unique model semantics and tractability, we propose restricting to stratified negation, i.e., allowing negation but no loops through negation. Then, we can add to Algorithm 1 an additional forloop around lines 6–13 to compute the answer stream stratum by stratum bottom up as usual. In fact, our implementation makes use of this extension.
4 Evaluation
We evaluate the performance of Laser ^{1} on two dimensions: First, we measure the impact of our incremental procedures on several operators by microbenchmarking the system on special singlerule programs. Second, we compare the performance against the state of the art on more realistic programs.
Streams. Unfortunately, we could not use some wellknown stream reasoning benchmarks (e.g., SRBench [28], CSRBench [12] LSBench [20], and CityBench [2]) because (i) we need to manually change the window sizes and the speed of stream in order to benchmark our incremental approach, but this is not often supported in these benchmarks; (ii) in order to be effective, a microbenchmark needs to introduce as little overhead as possible; (iii) we needed to make sure that all reasoners return the same results for a fair comparison, and this was easier with a custom data generator that we wrote for this purpose.
Stateoftheart. In line with current literature, we selected CSPARQL [5], and CQELS [19] as our main competitors. For LARS operators that are not supported by these engines, we compare Laser with Ticker [10], another recent engine for (nonstratified) plain LARS programs.^{2} Ticker comes with two reasoning modes, a fully incremental one, and another one that uses an ASP encoding which is then evaluated by the ASP solver Clingo [15]. The incremental reasoning mode was not available at the time of this evaluation. Thus, our evaluation against Ticker concerns only the reasoning mode which is based on Clingo.
Data generation. Unfortunately, each engine has its own routines for reading the input. As a result, we were compelled to develop custom data generators to guarantee fairness. A key problem is that CQELS processes every new data item immediately after the arrival in contrast to Laser and CSPARQL that process them in batches. Hence, to control the number of triples that stream into CQELS, and make sure that all engines receive equal number of triples at every time point, we configured each data generator to issue a triple at calculated intervals. For this same reason, we report the evaluation results as the average runtime per input triple and not runtime per time point.
Experimental platform. The experiments were performed on a machine with 32core Intel(R) Xeon(R) 2.60 GHz and 256 G of memory. We used Java 1.8 for CSPARQL and CQELS and PyPy 5.8 for Laser. We set the initial Java heap size to 20 G and increase the maximum heap size to 80 G to minimize potential negative effects of JVM garbage collection. For Ticker we used Clingo 5.1.0.
WindowDiamond. The standard snapshot semantics employed in CSPARQL and CQELS selects recent data and then abstracts away the timestamps. In LARS, this amounts to using \(\Diamond \) to existentially quantify within a window. Here, we evaluate how efficiently each engine can evaluate this case.
We use the rule \(q(A,B) \leftarrow \boxplus ^n\Diamond p(A,B)\), where a predicate of form r(A, B) corresponds to a triple \(\langle A, r, B \rangle \). The window size and the stream rate (i.e. the number of atoms streaming in the system at every time point) are the experiment parameters. We create a number of artificial streams which produces a series of unique atoms with predicate p at different rates; we vary window sizes from 1 s to 80 s and the stream rate from 200 to 800 triples per second (t/s).
Figure 2(a) reports the average runtime per input triple for each engine. The figure shows that Laser is faster than the other engines. Furthermore, we observe that average runtime of Laser grows significantly slower with the window size as well as with the stream rate. Here, incremental reasoning clearly is beneficial.
(i) Laser is significantly faster that CQELS and CSPARQL with all configurations of window and stream sizes. (ii) The difference becomes bigger for larger window sizes for which the benefit of incremental evaluation increases.
We profiled the execution of Laser with the larger windows and stream sizes and discovered that only about half of the time is spent on the join while half is needed to return the results. We also performed an experiment where we deactivated sSNE and did a normal join instead. We observed that sSNE is slightly slower than the normal join with small window sizes, but as the size of windows and stream rate increase, sSNE is significantly faster. In the best case, the activation of sSNE produced a runtime which was 10 times lower.
Evaluating multiple rules. We now evaluate the performance of Laser in a situation where the program contains multiple rules. In CSPARQL or CQELS, this translates to a scenario where there are multiple standing queries. To do so, we run a series of experiments where we changed the number of rules and the window sizes (stream rate was constant at 200 t/s). To that end, we utilize the same rule that we used in the data join benchmark with the same data generator. Figure 3(b) presents the average runtime (per triple). We see that also in this case Laser outperforms both CSPARQL and CQELS, except in the very last case where all systems did not finish on time.
Cooling use case. So far we have evaluated the performance using analytic benchmarks. Now, we measure the performance of Laser with a program that deals with a cooling system. The program of Fig. 4 determines based on a water temperature stream whether the system is working under normal conditions, or it is too hot and produces steam, or is too cold and the water is freezing.
5 Related Work and Conclusion
Related Work. The vision of stream reasoning was proposed by Della Valle et al. in [11]. Since then, numerous publications have studied different aspects of stream reasoning such as: extending SPARQL for stream querying [4, 19], building stream reasoners [4, 19, 22], scalable stream reasoning [16], and ASP models for stream reasoning [14]. However, due to lack of standardized formalism for RDF stream processing, each of these engines provide a different set of features, and results are hard to compare. A survey of these techniques is available at [21]. Our work differs in the sense that it is based on LARS [9], one of the first formal semantics for stream reasoning with window operators.
An area closely related to stream processing is incremental reasoning, which has been the subject of a large volume of research [23, 27]. In this context, [6] describes a technique to add expiration time to RDF triples to drop them when the are no longer valid. Nonetheless, this approach does not support expressive operations such as \(\Box \) and @ that our engine supports. In a similar way, [18] proposes another incremental algorithm for processing streams which again boils down to efficiently identifying expired information. We showed that our approach outperforms their work. Next, [8] proposes a technique to incrementally update an answer stream of a socalled sstratified plain LARS program by extending truth maintenance techniques. While [8] focuses on multiple models, we aim at highly efficient reasoning for use cases that guarantee single models. Similarly, the incremental reasoning mode of Ticker [10] focuses on model maintenance but not on high performance. Stream reasoning based on ASP was also explored in a probabilistic context [25] which however did not employ windows.
Conclusion. We presented Laser, a new stream reasoner that is built on the rulebased framework LARS. Laser distinguishes itself by supporting expressive reasoning without giving up efficient computation. Our implementation, freely available, has competitive performance with the current stateoftheart. This indicates that expressive reasoning is possible also on highly dynamic streams of data. Future work can be done on several fronts: Practically, our techniques extend naturally to further windows operators such as tumbling windows or tuplebased windows with prefiltering. From a theoretical perspective, the question arises which variations or more involved syntactic fragments of LARS may be considered that are compatible with the presented annotationbased incremental evaluation. Moreover, our support of stratified negation is prototypical and can be made more efficient. More generally, investigations on the systemrelated research question of reducing the runtimes even further are important to tackle the increasing number and volumes of streams that are emerging from the Web.
Footnotes
References
 1.Abiteboul, S., Hull, R., Vianu, V.: Foundations of Databases, vol. 8. AddisonWesley, Reading (1995)zbMATHGoogle Scholar
 2.Ali, M.I., Gao, F., Mileo, A.: CityBench: a configurable benchmark to evaluate RSP engines using smart city datasets. In: Arenas, M., et al. (eds.) ISWC 2015. LNCS, vol. 9367, pp. 374–389. Springer, Cham (2015). doi: 10.1007/9783319250106_25 CrossRefGoogle Scholar
 3.Anicic, D., Fodor, P., Rudolph, S., Nenad Stojanovic, E.S.: A unified language for event processing and stream reasoning. In: Proceedings of WWW, pp. 635–644 (2011)Google Scholar
 4.Barbieri, D.F., Braga, D., Ceri, S., Valle, E.D., Grossniklaus, M.: CSPARQL: SPARQL for continuous querying. In: Proceedings of WWW, pp. 1061–1062. ACM (2009)Google Scholar
 5.Barbieri, D.F., Braga, D., Ceri, S., Della Valle, E., Grossniklaus, M.: CSPARQL: a continuous query language for RDF data streams. Int. J. Semant. Comput. 4(1), 3–25 (2010)CrossRefzbMATHGoogle Scholar
 6.Barbieri, D.F., Braga, D., Ceri, S., Valle, E., Grossniklaus, M.: Incremental reasoning on streams and rich background knowledge. In: Aroyo, L., Antoniou, G., Hyvönen, E., Teije, A., Stuckenschmidt, H., Cabral, L., Tudorache, T. (eds.) ESWC 2010. LNCS, vol. 6088, pp. 1–15. Springer, Heidelberg (2010). doi: 10.1007/9783642134869_1 CrossRefGoogle Scholar
 7.Bazoobandi, H.R., Beck, H., Urbani, J.: Expressive Stream Reasoning with Laser. CoRR, abs/1707.08876 (2017)Google Scholar
 8.Beck, H., DaoTran, M., Eiter, T.: Answer update for rulebased stream reasoning. In: Proceedings of IJCAI, pp. 2741–2747 (2015)Google Scholar
 9.Beck, H., DaoTran, M., Eiter, T., Fink, M.: LARS: A logicbased framework for analyzing reasoning over streams. In: Proceedings of AAAI, pp. 1431–1438 (2015)Google Scholar
 10.Beck, H., Eiter, T., Folie, C.: Ticker: a system for incremental ASPbased stream reasoning. TPLP (2017, to appear)Google Scholar
 11.Della Valle, E., Ceri, S., Van Harmelen, F., Fensel, D.: It’s a streaming world! reasoning upon rapidly changing information. IEEE Intell. Syst. 24(6), 83–89 (2009)Google Scholar
 12.Dell’Aglio, D., Calbimonte, J.P., Balduini, M., Corcho, O., Della Valle, E.: On correctness in RDF stream processor benchmarking. In: Alani, H., et al. (eds.) ISWC 2013. LNCS, vol. 8219, pp. 326–342. Springer, Heidelberg (2013). doi: 10.1007/9783642413384_21 CrossRefGoogle Scholar
 13.Faber, W., Leone, N., Pfeifer, G.: Recursive aggregates in disjunctive logic programs: semantics and complexity. In: Alferes, J.J., Leite, J. (eds.) JELIA 2004. LNCS (LNAI), vol. 3229, pp. 200–212. Springer, Heidelberg (2004). doi: 10.1007/9783540302278_19 CrossRefGoogle Scholar
 14.Gebser, M., Grote, T., Kaminski, R., Obermeier, P., Sabuncu, O., Schaub, T.: Answer set programming for stream reasoning. arXiv preprint arXiv:1301.1392 (2013)
 15.Gebser, M., Kaminski, R., Kaufmann, B., Schaub, T.: Clingo = ASP + control: Preliminary report. CoRR, abs/1405.3694 (2014)Google Scholar
 16.Hoeksema, J., Kotoulas, S.: Highperformance distributed stream reasoning using S4. In: Ordring Workshop at ISWC (2011)Google Scholar
 17.Komazec, S., Cerri, D., Fensel, D.: Sparkwave: continuous schemaenhanced pattern matching over RDF data streams. In: DEBS, pp. 58–68 (2012)Google Scholar
 18.LePhuoc, D.: Operatoraware approach for boosting performance in RDF stream processing. Web Semant. Sci. Serv. Agents World Wide Web 42, 38–54 (2017)CrossRefGoogle Scholar
 19.LePhuoc, D., DaoTran, M., Xavier Parreira, J., Hauswirth, M.: A native and adaptive approach for unified processing of linked streams and linked data. In: Aroyo, L., Welty, C., Alani, H., Taylor, J., Bernstein, A., Kagal, L., Noy, N., Blomqvist, E. (eds.) ISWC 2011. LNCS, vol. 7031, pp. 370–388. Springer, Heidelberg (2011). doi: 10.1007/9783642250736_24 CrossRefGoogle Scholar
 20.LePhuoc, D., DaoTran, M., Pham, M.D., Boncz, P., Eiter, T., Fink, M.: Linked stream data processing engines: facts and figures. In: CudréMauroux, P., et al. (eds.) ISWC 2012. LNCS, vol. 7650, pp. 300–312. Springer, Heidelberg (2012). doi: 10.1007/9783642351730_20 CrossRefGoogle Scholar
 21.Margara, A., Urbani, J., Van Harmelen, F., Bal, H.: Streaming the web: Reasoning over dynamic data. Web Semant. Sci. Serv. Agents World Wide Web 25, 24–44 (2014)CrossRefGoogle Scholar
 22.Mileo, A., Abdelrahman, A., Policarpio, S., Hauswirth, M.: StreamRule: a nonmonotonic stream reasoning system for the semantic web. In: International Conference on Web Reasoning and Rule Systems, pp. 247–252 (2013)Google Scholar
 23.Motik, B., Nenov, Y., Piro, R., Horrocks, I.: Incremental update of datalog materialisation: the backward/forward algorithm. In: Proceedings of AAAI, pp. 1560–1568 (2015)Google Scholar
 24.Nenov, Y., Piro, R., Motik, B., Horrocks, I., Wu, Z., Banerjee, J.: RDFox: a highlyscalable RDF store. In: Arenas, M., et al. (eds.) ISWC 2015. LNCS, vol. 9367, pp. 3–20. Springer, Cham (2015). doi: 10.1007/9783319250106_1 CrossRefGoogle Scholar
 25.Nickles, M., Mileo, A.: A hybrid approach to inference in probabilistic nonmonotonic logic programming. In: Proceedings of the 2nd International Workshop on Probabilistic Logic, pp. 57–68 (2015)Google Scholar
 26.Urbani, J., Jacobs, C., Krötzsch, M.: Columnoriented datalog materialization for large knowledge graphs. In Proceedings of AAAI, pp. 258–264 (2016)Google Scholar
 27.Urbani, J., Margara, A., Jacobs, C., Harmelen, F., Bal, H.: DynamiTE: parallel materialization of dynamic RDF data. In: Alani, H., et al. (eds.) ISWC 2013. LNCS, vol. 8218, pp. 657–672. Springer, Heidelberg (2013). doi: 10.1007/9783642413353_41 CrossRefGoogle Scholar
 28.Zhang, Y., Duc, P.M., Corcho, O., Calbimonte, J.P.: SRBench: a streaming RDF/SPARQL benchmark. In: CudréMauroux, P., et al. (eds.) ISWC 2012. LNCS, vol. 7649, pp. 641–657. Springer, Heidelberg (2012). doi: 10.1007/9783642351761_40 CrossRefGoogle Scholar