# A Query Model to Capture Event Pattern Matching in RDF Stream Processing Query Languages

## Abstract

The current state of the art in RDF Stream Processing (RSP) proposes several models and implementations to combine Semantic Web technologies with Data Stream Management System (DSMS) operators like windows. Meanwhile, only a few solutions combine Semantic Web and Complex Event Processing (CEP), which includes relevant features, such as identifying sequences of events in streams. Current RSP query languages that support CEP features have several limitations: EP-SPARQL can identify sequences, but its selection and consumption policies are not all formally defined, while C-SPARQL offers only a naive support to pattern detection through a timestamp function. In this work, we introduce an RSP query language, called RSEP-QL, which supports both DSMS and CEP operators, with a special interest in formalizing CEP selection and consumption policies. We show that RSEP-QL captures EP-SPARQL and C-SPARQL, and offers features going beyond the ones provided by current RSP query languages.

## 1 Introduction

Processing heterogeneous and dynamic data is a challenging research topic and has a wide range of applications in real-world scenarios. Different models, languages, and systems have been proposed in the last years to handle streams on the Web, combining Semantic Web technologies with Complex Event Processing (CEP) [18] and Data Stream Management Systems (DSMS) [5] features. These languages and systems, commonly labeled under the RDF Stream Processing (RSP) name, are solutions that extend SPARQL with stream processing features, based on either the CEP or DSMS paradigm.

A problem that recently emerged is the heterogeneity of those solutions [11, 13]. Every RSP engine has unique features that are not replicable by others; moreover, even when the same feature is supported by two or more engines, the behavior and the produced output can be different and hardly comparable. In our previous work, namely RSP-QL [14] and LARS [7], we developed models to capture the RSP features inspired by the DSMS paradigm, e.g., time-based sliding windows and aggregations over streams.

In this paper, we study the integration of the currently available CEP features in RSP engines into RSP-QL, by investigating the research question: *“Is it possible to extend RSP-QL to enable the detection of expressive event patterns over RDF streams?”* We give an answer with RSEP-QL, an RSP query model that incorporates CEP at its core.

RSEP-QL is a reference model^{1} and has several possible uses: (a) to provide a common framework to explain the behavior of existing RSP solutions, enabling their comparison; (b) to support software architects to design new RSP implementations; testers in designing benchmarks and evaluations; and researchers to have a general model to develop new research; (c) to act as a formal model to define a standardized language that embraces the most prominent features of existing RSP languages.

Combining CEP and DSMS features in a unique model is a step towards filling the gap between RSP and stream processing engines available on the non-semantically-aware systems on the market (e.g., Oracle Event Processor, ESPER, IBM InfoSphere Streams) [10]. There are indeed several motivations behind combining DSMS and CEP. It is clearly possible to mix different DSMS and CEP languages to achieve the desired tasks, but there are drawbacks, e.g., the need to learn multiple languages, the limited possibility for query optimizations, the potential higher amount of resources.

Our contributions are: (1) We elicit a set of requirements to design an RSP query model that supports both DSMS and CEP features. (2) We adapt our model to process RDF graphs as stream elements, following the current guidelines of the W3C RSP Community Group (RSP-CG).^{2} (3) We introduce event patterns to capture CEP features of existing RSP engines, most notably the sequencing operator, and provide syntax and semantics as extensions of SPARQL. (4) We formally define selection and consumption policies, to capture the operational semantics of the CEP-inspired RSP engines, contrary to current approaches that consider policies at the implementation level.

## 2 Related Work and Requirements

RSP engines emerged in recent years, with the goal of extending RDF and SPARQL to process RDF streams. They can be broadly divided into two groups. RSPs influenced by CEP reactively process the input streams to identify relevant events and sequences of them. EP-SPARQL [3] is one of the first RSP that adopts some of these complex pattern operators. Other such recent approaches include Sparkwave [17] and Instans [20]. On the other hand, approaches inspired by DSMS exploit sliding window mechanisms to capture a recent and finite portion of the input data, enabling their processing through SPARQL operators [15] in an atemporal fashion. C-SPARQL [6], CQELS [19], and SPARQL\(_{stream}\) [9] are representative examples of this group.

**[R1] RSEP-QL should process RDF graph-based streams.** While in early RSP data models the stream data items are represented by single RDF statements, the recent standardization effort from W3C RSP-CG proposes to adopt RDF graphs as items^{3}. The latter model generalizes of the former, as a stream of time-annotated RDF statements can be modeled as a stream of time-annotated RDF graphs, each containing one statement. In this sense, addressing [R1] is important to realize a generic RDF stream query model.

**[R2] RSEP-QL must preserve the DSMS features captured by RSP-QL.** The introduction of CEP features in the model should not lead to incompatibilities with the RSP models we already captured in RSP-QL [14]. This requirement is important to guarantee that RSEP-QL is generic enough to model the operational semantics of different systems.

**[R3] RSEP-QL should capture the CEP features of existing RSP engines.** In this work, we focus on the \(\mathrm {SEQ}\) operator: the most basic building block in CEP. Intuitively, \(E_1~\!\mathrm {SEQ}~E_2\) identifies events matching pattern \(E_1\) followed by those matching \(E_2\). Even if it may seem straightforward to formalize this operator, its execution in different engines produces different and hardly comparable results. We, therefore, refine [R3] into two sub-requirements, associated with the two engines we aim at capturing, EP-SPARQL and C-SPARQL. To illustrate our idea, we use the RDF stream depicted in Fig. 1.

**[R3.1] RSEP-QL should capture the EP-SPARQL SEQ behavior.**To the best of our knowledge, EP-SPARQL is the RSP language with the largest support for CEP features, with a wide range of operators to define complex events, e.g., \(\mathrm {SEQ}\), OPTIONALSEQ, EQUALS and EQUALSOPTIONAL. EP-SPARQL supports three different policies [2]:

*unrestricted*: all input elements are selected for matching the event patterns.*chronological*: only the earliest input that can be matched are selected for matching the event patterns; then, they are ignored in the next evaluations.*recent*: only the latest input that can be matched are selected for matching the event patterns; then, they are ignored in the next evaluations.

The table of Fig. 1 shows the different behaviors of these three settings. Assume that there are two evaluations at time points 8 and 10. *Unrestricted* returns \(e_1,e_2,e_3\) at 8 and \(e_4\) at 10. *Chronological* returns only \(e_1\) and \(e_2\) at 8. *Recent* returns only \(e_2\) and \(e_3\) at 8. Furthermore, both *chronological* and *recent* do not return any event at 10 because \(({:\!\! a _1}~{:\!\! p }~{:\!\! b _1})\) were already consumed by the previous evaluation.

Notably, the EP-SPARQL query does not change in the three cases, as the setting is a configuration parameter set at the startup of the engine. Moreover, independently on the setting, all the system outputs happen as soon as they are available.

**[R3.2] RSEP-QL should capture the C-SPARQL SEQ behavior.** C-SPARQL is based on DSMS techniques, but it has a naive support to some CEP features. C-SPARQL implements a function, named \( timestamp \) that takes as input a triple pattern and returns the time instant associated to the *most recent* matched triple. This function can be used inside a FILTER clause to express time constraints among events.

The evaluation in C-SPARQL strictly relies on the notion of time-based sliding window, which selects a portion of the stream to be used as input and the time instants on which evaluations occur. Wrt. the above example, with a sliding window with a length of 7 and that slides of 1 at each step, C-SPARQL outputs \(e_3\) at time 8 and has no output at 10, not because the input triples were consumed, but because it considers only the two triples \(({:\!\! b _1}~{:\!\! q }~{:\!\! c _1})\) and \(({:\!\! a _3}~{:\!\! p }~{:\!\! b _3})\) which do not match the sequencing pattern.

### Remarks

While EP-SPARQL is an engine for performing CEP, C-SPARQL is a DSMS-inspired RSP engine that offers a naive support to event pattern matching. As shown above, even with simple event patterns, the two systems behave in completely different ways, and none of them is able to capture the other. It is out of the scope of this paper to determine which system is the most suitable to be used given a use case and the relative set of requirements. Our goal is to build a model able to capture the behavior of both engines. In this sense, satisfying both [R3.1] and [R3.2] is minimal to assess that RSEP-QL is a common framework to describe the semantics of RSP engines.

## 3 Anatomy of RSEP-QL Queries

A SPARQL query is defined by a signature of the form \((E, DS , QF )\), that indicates the evaluation of an algebraic expression *E* over a set of data \( DS \) to produce an answer formatted according to a query form \( QF \) [16]. This section proposes RSEP-QL queries that extend SPARQL’s queries with the following features: (1) the capability to take as input not only RDF graphs but also RDF streams; (2) a set of operators to access/process streams; and (3) an evaluation paradigm moving from one-time to continuous semantics.

### 3.1 Data Model

There are two main kinds of input data in the context of stream processing. The first are streams, defined as sequences of highly dynamic and time-annotated data such as sensor data and micro-posts. The second type is contextual (or background) data, which is usually static or quasi-static and is used to enrich the streams and solve more sophisticated tasks, e.g., sensor locations, user profiles. etc. In RSP, contextual data may be captured by RDF graphs, while streams are captured with RDF streams.

**RDF Streams.** To fulfill [R1], we adopt the notion of time-annotated RDF graphs as elements of RDF streams, following the data model under design by RSP-CG. We define a *timeline* *T* as an infinite, discrete, ordered sequence of time instants \((t_1, t_2,\ldots )\), where \(t_i\in \mathbb {N}\) and for all \(i>0\), it holds that \(t_{i+1}-t_i\) is a constant, called the *time unit* of *T*.

We now extend the definition of RDF graphs with time annotations and then define RDF streams as sequences of them.

### Definition 1

**(RDF Stream).**A

*timestamped RDF graph*is a pair (

*G*,

*t*), where

*G*is an RDF graph and \(t\in T\) is a time instant. An

*RDF stream*

*S*is a (potentially) unbounded sequence of timestamped RDF graphs in a non-decreasing time order:

Other streaming data model profiles exist and are currently under study by the RSP-CG. In this work, we focus on the model where the time annotation is represented by one time instant, as it is a usual case that appears in several scenarios.

### Example 1

Figure 1 illustrates a stream \(S=(G_1,2),(G_2,4),(G_3,6),(G_4,8),(G_5,10),\ldots \), where each \(G_i\) contains the depicted RDF triples. \(\square \)

**Time-Varying Graphs.** Statements in RDF graphs are atemporal and capture a given situation in a snapshot. We introduce the notion of time-varying graphs to capture the evolution of the graph over time (similar to time-varying relations in [4]).

### Definition 2

**(Time-Varying Graph).**A

*time-varying graph*\(\overline{G}\) is a function that relates time instants \({t\in T}\) to RDF graphs:

*instantaneous RDF graph*\(\overline{G}(t)\) is the RDF graph identified by the time-varying graph \(\overline{G}\) at a given time instant

*t*.

RDF streams and time-varying graphs differ on the time information: while in the former time annotations are accessible and processable by the stream processing engine, in the latter there is no explicit time annotation. In this sense, *t* in Definition 2 can be viewed as a timestamp denoting the access time of the engine to the graph content.

### 3.2 RSEP-QL Dataset

A SPARQL dataset is a set of pairs (*u*, *G*), where \(u \in I\cup \{ def \}\)^{4} is an identifier for an RDF graph *G*. This section proposes the notion of dataset for RSEP-QL. It differs from SPARQL datasets in the presence of streams, and that RSEP-QL dataset elements may vary over time. Streams are potentially infinite, and the usage of windows allows to have a finite (and usually recent) view of portions of the streams for practical processing. We now introduce a generic notion of window functions, inspired by LARS [7].

### Definition 3

**(Window Function).** A *window function**W* with a vector of *window parameters*\({{\varvec{p}}}\), denoted as \( W[{{\varvec{p}}}]\), takes as input a stream *S*, a time instant \(t \in T\) and produces a *substream* (aka. *window*) \(S'\) of *S*, i.e., a finite subsequence of *S*.

This generic notion can be instantiated with specific parameters \({{\varvec{p}}}\) to realize window functions used in practice. In the following, we present a set of window functions that constitute the basis of the operators defined in the next sections.

**Time-Based (sliding) Windows.**A

*time-based window*function \(W^\tau \) is defined through \({\varvec{p}}=(\alpha , \beta )\), where \(\alpha \) is the width and \(\beta \) is the sliding step. It slides every \(\beta \) time units and filters input graphs of the last \(\alpha \) time units. Let \(t'=\lfloor \frac{t}{\beta }\rfloor \!\cdot \!\beta \), we have that:

*j*,

*k*] is the maximal interval st. \(\forall i\in [j,k]:(G_i,t_i)\in S\wedge t'-\alpha <t_i\le t'.\)

**Landmark Windows.**A

*landmark window*function \(W^\lambda \) defined through \({{\varvec{p}}}=(t_0)\) returns the content of the input stream from \(t_0\):

*j*,

*k*] is the maximal interval st. \(\forall i\in [j,k]:(G_i,t_i)\in S\wedge t_0\le t_i\le t.\)

As we show below, landmark windows are useful to capture the behaviour of event pattern systems like EP-SPARQL. In fact, they offer views over large portions of the stream, without the eviction mechanism typical of sliding windows.

**Identity Window.**The

*identity window*function \(W^{ id }\) is introduced to give a uniform definition of event patterns evaluation later. It simply returns the input stream, that is:

**Interval Windows.**The

*interval-based*(or fixed) window function \(W^\sqcup \) is defined through \({{\varvec{p}}}=(t',t'')\) and returns the part of the input stream bounded by \([t',t'']\):

*W*(

*S*,

*t*). Notably, window functions can be nested, for example, we can have \(W^\sqcup (W^\tau (S,t),t)\). We denote the nesting by the \(\bullet \) operator. Formally:

### Example 2

*S*from Example 1. Here are some results of applying the time-based, landmark, and interval window functions \(W^\tau \), \(W^\lambda ,\) and \(W^\sqcup \) on this stream:

**Dataset.** We now formally define RSEP-QL datasets, as sets of pairs of an identifier \(u \in I\cup \{ def \}\) and either a window function applied to a stream or a time-varying graph.

### Definition 4

**(RSEP-QL Dataset).**An

*RDF streaming dataset*\( SDS \) is a set consisting of an (optional) default time-varying graph \(\overline{G}_0\), \(n\ge 0\) named time-varying graphs, and \(m\ge 0\) named window functions applied to a set of streams \({\mathbf {S}=\{S_1,\ldots ,S_k\}{:}}\)

\(\overline{G}_0\) is the default time-varying graph,

\(g_i \in I\) is the identifier of the time-varying graph \(\overline{G}_i\),

\(w_j\in I\) is the identifier of the named window function \( W_j\) over the RDF stream \(S_\ell \in \mathbf {S}\).

We denote by \( ids ( SDS )=\{ def \}\cup \{g_1,\ldots ,g_n\}\cup \{w_1,\ldots ,w_m\}\) the set of symbols identifying the time-varying graphs and windows in \( SDS \).

An important difference that emerges comparing the SPARQL and the RSEP-QL dataset is that the former contains RDF graphs and is fixed in the sense that SPARQL datasets are composed according to the query (e.g. FROM clauses), and the set of elements included in a dataset does not vary over time. On the other hand, RSEP-QL datasets contain RDF streams and time-varying graphs that are updated as time proceeds.

### Example 3

Let \(W^\lambda _1\) and \(W^\tau _2\) be a landmark and a time-based window functions with respective parameters \({{\varvec{p}}}_1=(1)\) and \({{\varvec{p}}}_2=(5,1)\). Then, \( SDS =\{(w_1,W^\lambda _1(S)),(w_2,W^\tau _2(S))\}\) is an RDF streaming dataset, where *S* is from Example 1. \(\square \)

### 3.3 RSEP-QL Patterns

To fulfill [R2] and [R3], we introduce RSEP-QL operators to enable DSMS and CEP features. We then extend SPARQL graph patterns to support these operators on streams.

In SPARQL, the construction of the query relies on graph patterns. The elementary building block for building graph patterns is *Basic Graph Patterns* (BGP), i.e. sets of triple patterns \((t_s,t_p,t_o)\in (I \cup B \cup L \cup V)\times (I \cup V)\times (I \cup B \cup L \cup V)\). More complex patterns are recursively defined on top of BGP using operators such as join and union^{5}.

Concerning DSMS operations, we introduce the *window graph pattern*, defined as an expression \((\mathrm {WINDOW}~w_j~P)\), where *P* is a SPARQL graph pattern and \(w_j\in I\) is an IRI. Intuitively, \(\mathrm {WINDOW}\) indicates that *P* should be evaluated over the content of the window identified by \(w_j\) in the dataset (similarly to the SPARQL GRAPH operator).

To support CEP features, we introduce *event patterns* as follows.

- (1)
If

*P*is a Basic Graph Pattern, \(w\in I\), then the expressions \((\mathrm {EVENT}~w~P)\) is an event pattern, named*Basic Event Pattern*(BEP)^{6}; - (2)
If \(E_1\) and \(E_2\) are event patterns, then the expressions \((\mathrm {FIRST}~E_1)\), \((\mathrm {LAST}~E_1)\), \((E_1~\mathrm {SEQ}~E_2)\) are event patterns;

To relate graph and event patterns, we define the *event graph pattern* as \((\mathrm {MATCH}~E)\) where *E* is an event pattern.

### 3.4 Query Definition

Having all building blocks, it is now possible to define RSEP-QL queries.

### Definition 5

An *RSEP-QL query* *Q* is defined as \(( SE , SDS , ET , QF )\), where \( SE \) is an RSEP-QL algebraic expression, \( SDS \) is an RDF streaming dataset, \( ET \) is the sequence of time instants on which the evaluation occurs, and \( QF \) is the Query Form.

The continuous evaluation paradigm is captured in the query signature through the set \( ET \) of execution times. Intuitively, this set represents the time instants on which the algebraic expression evaluation may occur. Note that this set is not explicitly defined by the query and in general it may be unknown at query registration time (as it can depend on the streaming content). In practice, *ET* can be expressed through report policies [8], which define rules to trigger the query evaluation. For example, C-SPARQL can be captured by a window close report policy, i.e., evaluations are periodically and determined by the window definition. EP-SPARQL and CQELS are regulated by content change report policy, i.e., evaluations occur every time a new item appears on the stream.

### Example 4

This example presents an RSEP-QL query with CEP features. The \(\mathrm {MATCH}\) clause describes an event pattern \((E_1~\mathrm {SEQ}~E_2)\), where the BEPs \(E_1\) and \(E_2\) are defined on the respective landmark and time-based windows from Example 3. Their patterns are: \(E_1=\mathrm {EVENT}~w_1~({? x }~{:\!\! p }~{? y })\) and \(E_2=\mathrm {EVENT}~w_2~({? y }~{:\!\! q }~{? z })\).

## 4 RSEP-QL Semantics

We now proceed to define the evaluation semantics of the operators introduced in Sect. 3.3. Sections 4.1 and 4.2 present the semantics of the graph pattern and event pattern operators, respectively. Sections 4.3 and 4.4 address CEP selection and consumption policies to completely capture settings such as *chronological**recent* of EP-SPARQL, or the naive sequencing of triples based on last their appearances like in C-SPARQL.

### 4.1 Graph Pattern Evaluation Semantics

To cope with graph-based RDF streams, we adapt the graph pattern evaluation semantics from [14]. There, the evaluation semantics of a SPARQL operator is defined as a function that takes as input a graph pattern *P* and a SPARQL dataset \( DS \) having a default RDF graph *G*, and produces bags of solution mappings: partial functions that map variables to RDF terms. It is usually denoted as \(\llbracket P \rrbracket _{ DS (G)}\).

The RSEP-QL evaluation semantics of graph patterns considers the evaluation time instants and redefines the active graph notion. Given an RSEP-QL dataset \( SDS \) and an identifier \(\iota \in ids (SDS)\) of one of its elements, we name *temporal sub-dataset*, denoted by \( SDS _\iota \), the active element of the dataset. The active element is \( SDS _\iota =\overline{G}_i\) if \((\iota =g_i,\overline{G}_i) \in SDS \), or \( SDS _\iota =W_j(S_\ell )\) if \((\iota =w_j,W_j(S_\ell )) \in SDS \).

### Definition 6

**(Graph Pattern Evaluation Semantics).**Given an RSEP-QL pattern

*P*, an active time-varying graph or window identified by \(\iota \in ids ( SDS )\) of a streaming dataset \( SDS \), and an evaluation time instant

*t*, we defineas the

*evaluation*of

*P*at

*t*over the active element \(\iota \) in \( SDS \).

We now briefly summarize the evaluation semantics of the graph patterns available in SPARQL, with a special focus on BGP and window graph patterns from Sect. 3.3.

**Basic Graph Pattern.**BGP evaluation in SPARQL is one of the few cases in which there is an actual access to the data stored in the active RDF graph. The idea behind the evaluation of BGPs in RSEP-QL is to exploit the SPARQL evaluation semantics. To make it possible, it is necessary to move from the active element \(\iota \) of \( SDS \) and the evaluation time instant

*t*to an RDF graph over which the BGP can be evaluated. We name this RDF graph the

*snapshot of a temporal sub-dataset*at

*t*, and it is defined as:

### Example 5

*P*as:

**Other SPARQL Graph Patterns.**For other graph patterns, we maintain the idea of SPARQL of defining them recursively [16]. For example, the graph pattern \(P_1~ Join ~P_2\):

*t*with regards to the active part \( SDS _\iota \) of the RDF streaming dataset \( SDS \).

**Window Graph Pattern.**Finally, we define the evaluation semantics of the window graph patterns. Given a window identifier \(w_j\) and a graph pattern

*P*, we have that:

### Example 6

### 4.2 Event Pattern Evaluation Semantics

Similarly to Sect. 4.1, we define the evaluation semantics of event pattern operators by decomposing complex patterns into simple ones. The main difference is that this decomposition process should take into account the temporal aspects related to event matching, i.e., the evaluation should (i) produce time-annotated solution mappings, and (ii) control the time range in which a subpattern is processed. We address (i) by defining the notion of *event mapping* as a triple \((\mu ,t_1,t_2)\) composed by a solution mapping and two time instants \(t_1\) and \(t_2\), representing the initial and final time instants that justify the matching, respectively. We assume that a partial order \(\prec \) to compare timestamps is given. Depending on particular applications, specific ordering can be chosen. Regarding (ii), we associate the evaluation with an active window function that sets the boundaries of the valid ranges for evaluating event patterns.

### Definition 7

**(Event Pattern Evaluation Semantics).**Given an event pattern

*E*, a window function

*W*(

*active window*), and an evaluation time instant \(t\in ET \), we defineas the

*evaluation*of

*E*in the scope defined by

*W*at

*t*.

Different from graph pattern evaluation semantics, in this case there is no explicit reference to data. This information is carried in the basic event patterns defined below.

**Basic Event Patterns.**Similar to BGPs, Basic Event Patterns (BEP) are the simplest building block. The idea behind their semantics is to produce a set of SPARQL BGP evaluations over the stream items from a snapshot of a temporal sub-dataset (identified by \(w_j\)), restricted by the active window function

*W*:

### Example 7

*S*, we have:

It is worth comparing the evaluation semantics of a BEP with the one of a BGP as defined in Sect. 4.1. They both exploit the SPARQL BGP evaluation, but while the former defines an evaluation for each stream item (i.e., an RDF graph), the latter is a unique evaluation over the merge of the stream items in one RDF graph.

**Other Event Patterns.**Next is the semantics of other event patterns, starting with those that identify the

*first*and

*last*event matching a pattern, based on the ordering \(\prec \).Let us now consider the \(\mathrm {SEQ}\) operator. The evaluation of \(E_1~\mathrm {SEQ}~E_2\) is defined as:Intuitively, for each event mapping \((\mu _2,t_3,t_4)\) that matches \(E_2\), Eq. (7) seeks for (a)

*compatible*and (b)

*preceding*event mappings matching \(E_1\). The two demands are guaranteed by introducing constraints on the evaluation of \(E_1\):

(a) is imposed by, in \(E_1\), substituting the shared variables with \(E_2\) for their values from \(\mu _2\), denoted by \(\mu _2(E_1)\).

(b) is ensured by restricting the time range on which input graphs are used to match \(\mu _2(E_1)\): we only consider graphs appearing before \(t_3\), thus \(W^\sqcup [0,t_3-1]\bullet W\).

### Example 8

*(cont’d).*We show how Open image in new window is evaluated. For Open image in new window, we then evaluate:Similar to Example 7, we first see that \(W^\sqcup [0,5]\bullet W^\lambda _1(S,8)=(G_1,2),(G_2,4)\). Then, evaluating \(\llbracket {? x }~{:\!\! p }~{:\!\! b _1} \rrbracket _{G_k}\) for \(k=1,2\) matches in only \(G_1\). Therefore, the mapping satisfying conditions (a) and (b) is \((\mu _1,t_1,t_2)=(\{{? x }\mapsto {:\!\! a _1},{? y }\mapsto {:\!\! b _1}\},2,2)\). Finally, Eq. (7) gives us \((\{{? x }\mapsto {:\!\! a _1},{? y }\mapsto {:\!\! b _1},{? z }\mapsto {:\!\! c _1}\},2,6)\).

Similarly, with \((\mu _2',6,6)\) and \((\mu _2',8,8)\) from Example 7, we find a compatible and preceding match \((\{{? x }{\mapsto }{:\!\! a _2},{? y }{\mapsto }{:\!\! b _2}\},4,4)\) for \(E_1\). This gives us two more results: \((\{{? x }{\mapsto }{:\!\! a _2},{? y }{\mapsto }{:\!\! b _2},{? z }{\mapsto }{:\!\! c _2}\},4,8)\) and \((\{{? x }{\mapsto }{:\!\! a _2},{? y }{\mapsto }{:\!\! b _2},{? z }{\mapsto }{:\!\! c _2}\},6,8)\). \(\square \)

**Event Graph Pattern.**Finally, we define the semantics of the \(\mathrm {MATCH}\) operator. Being a graph pattern, its evaluation semantics is defined through the function in Definition 6. Intuitively, the function acts to remove the time annotations from event mappings and to produce a bag of solution mappings. Thus, the result of this operator can be combined with results of other graph pattern evaluations (i.e., other bags of solution mappings).The initial active window function to

*E*is \(W^{ id }\), which imposes no time restriction. Such restrictions can appear later by CEP operators like in Eq. (7).

### Example 9

### 4.3 Event Selection Policies

Evaluating the \(\mathrm {SEQ}\) operator as in Eq. (7) takes into account all possible matches from the two sub-patterns. This kind of evaluation captures only the *unrestricted* behavior of EP-SPARQL and C-SPARQL. With the purpose of formally capturing the CEP semantics of C-SPARQL and EP-SPARQL, we introduce in this section different versions of the sequencing operator that allows different ways of selecting stream items to perform matching, known as *selection policies*.

*naive*CEP behavior, Eq. (9) simply picks the two latest event mappings that match the two sub-patterns and compare their associated timestamps.For the

*chronological*and

*recent*settings from EP-SPARQL, we need more involved operators \(\mathrm {SEQ}^{ c }\) and \(\mathrm {SEQ}^{ r }\). In the sequel, let \(W^{\star }=W^\sqcup [0,t_3-1]\bullet W\).Compared to (7), Eq. (10) selects an event mapping \((\mu _2,t_3,t_4)\) of \(E_2\) that:

has a compatible event mappings in \(E_1\) which appeared before \(\mu _2\). This is guaranteed by the condition Open image in new window and the window function \(W^{\star }=W^\sqcup [0,t_3-1]\bullet W\);

is the first of such event mappings. This is ensured by stating that no such \((\mu _2',t_3',t_4')\) exists, where \((t_3',t_4')\prec (t_3,t_4)\).

Once \((\mu _2,t_3,t_4)\) is found, \((\mu _1,t_1,t_2)\) is taken from Open image in new window, which makes sure that it is the first compatible event that appeared before \((\mu _2,t_3,t_4)\). Finally, the output event matching \(E_1~\mathrm {SEQ}^{ c }~E_2\) is \((\mu _1\cup \mu _2,t_1,t_4)\).

### Example 10

### 4.4 Event Consumption Policies

Selection policies are not sufficient to capture the behavior of EP-SPARQL in the chronological and recent settings. As described in Sect. 2, under these settings, stream items that contribute to an answer are not considered in the following evaluation iterations. We complete the model by formalizing this feature, known as *consumption policies*.

Let \( ET =t_1,t_2,\ldots ,t_n,\ldots \) be the set of evaluation instants. Abusing notation, we say that a window function \(w_j\)*appears* in an event pattern *E*, denoted by \(w_j\hat{\in }E\), if *E* contains a basic event pattern of the form \((\mathrm {EVENT}~w_j~P)\).

Consumption policies which determine input for the evaluation will be covered next. Definition 8 is about a possible input for the evaluation while Definition 9 talks about the new incoming input. We first define such notions for a window in an RDF streaming dataset, and then lift them to the level of structures that refer to all windows in an event pattern.

### Definition 8

**(Potential Input and Input Structure).** Given an RDF streaming dataset \( SDS \), we denote by \(I_i(w_j)\subseteq SDS _{w_j}(t_i)\) a *potential input* at time \(t_i\) of the window identified by \(w_j\). For initialization purposes, we let \(I_0(w_j)=\emptyset \).

Given an event pattern *E*, an *input structure* \(I_i\) of *E* at time \(t_i\) is a set of potential inputs at \(t_i\) of all windows appearing in *E*, i.e., \(I_i=\{I_i(w_j)\mid w_j\hat{\in }E\}\).

### Definition 9

**(Delta Input Structure).** Given an RDF streaming dataset \( SDS \) and two consecutive evaluation times \(t_{i-1}\) and \(t_i\), where \(i>1\), the new triples arriving at a window \(w_j\) are called a *delta input*, denoted by \(\varDelta _i(w_j)= SDS _{w_j}(t_i)\setminus SDS _{w_j}(t_{i-1})\). For initialization purposes, let \(\varDelta _1(w_j)= SDS _{w_j}(t_1)\).

Given an event pattern *E*, a *delta input structure* at time \(t_i\) is a set of delta inputs at \(t_i\) of all windows appearing in *E*, i.e., \(\varDelta _i=\{\varDelta _i(w_j)\mid w_j\hat{\in }E\}\).

We can now define consumption policies in a generic sense.

### Definition 10

**(Consumption Policy and Valid Input Structure).** A *consumption policy function* \(\mathcal {P}\) takes an event pattern *E*, a time instance \(t_i\in ET \), and a vector of additional parameters \({{\varvec{p}}}\) depending on the specific policy, and produces an input structure for *E*.

The resulted input structure is called *valid* if it is returned by applying \(\mathcal {P}\) on a set valid parameters \({{\varvec{p}}}\), where the validity of \({{\varvec{p}}}\) is defined based on each specific policy.

*E*at \(t_{i-1}\). The validity of input can be guaranteed by starting the evaluation with \(I_1(w_j)= SDS _{w_j}(t_1)\) which is valid by definition. For the formal description of \(\mathcal {P}^c\) and \(\mathcal {P}^r\), we refer the reader to the extended version of the paper.

^{7}

*E*with Open image in new window. Then, when the evaluation process reaches a BEP at leafs of the operator tree, \(\mathcal {P}\) is used to filter out already consumed input. Formally:where \(\mathcal {I}=I_i(w_j)\cap (\bigcup _{(G_k,t_k)\in W\bullet W_j(S_\ell ,t_i)}G_k)\) and \(I_i(w_j)\in I_i=\mathcal {P}(E,t_i,I_{i-1},\varDelta _i)\).

### Example 11

## 5 Conclusions and Outlook

Coverage of DSMS/CEP features of RSEP-QL compared to EP-SPARQL and C-SPARQL

RSEP-QL | EP-SPARQL/C-SPARQL |
---|---|

\(W^\lambda + \mathrm {SEQ}\) | EP-SPARQL unrestricted |

\(W^\lambda + \mathrm {SEQ}^{ c }+ \mathcal {P}^c\) | EP-SPARQL chronological |

\(W^\lambda + \mathrm {SEQ}^{ r }+ \mathcal {P}^r\) | EP-SPARQL recent |

\(W^\tau + \mathrm {SEQ}^{ n }\) | C-SPARQL SEQ (timestamp) |

\(W^\tau \) | C-SPARQL time-window |

We have also shown that RSEP-QL complies with the set of requirements described in Sect. 2. First, it processes RDF graph-based streams [R1]. It is also capable of capturing the DSMS features of representative RSP languages [R2], as an inheritance from the expressivity of RSP-QL. Moreover, RSEP-QL captures the behavior of the sequential event pattern matching features of EP-SPARQL and C-SPARQL [R3], including the different selection and consumption policies that they provide. Table 1 shows the equivalence of the main features in RSEP-QL with their counterparts in EP-SPARQL and C-SPARQL. For instance, one can observe that an EP-SPARQL sequence pattern (with recent policy) can be captured by the \(\mathrm {SEQ}^{ r }\) operator and the \(\mathcal {P}^r\) function on a landmark window in RSEP-QL.

Our formalization is able to capture a rich set of operators including time-based sliding windows and event patterns such as sequencing, and combines them. As a result, RSEP-QL offers expressivity beyond the capabilities of current RSPs. For example, RSEP-QL allows to define event patterns over more than one streams, e.g., given \(E_1~\mathrm {SEQ}~E_2\), \(E_1\) and \(E_2\) can match over different streams. It is not possible to express this with an EP-SPARQL or C-SPARQL query, as the first operates on a unique stream, while the latter merges different input streams in a unique one.

Furthermore, the expressivity of RSEP-QL allows defining complex queries that combine both windows and event patterns. For instance, consider that in a social network we want to find the post made by a user that is then followed by a popular user, defined as someone that gets a lot of mentions in the last hour and has a lot of followers. In this case, a time window is needed to keep track of the number of mentions in the last hour. Then the sequence pattern is required to capture the fact that someone is followed after he made a post. The contextual information is used to look for the number of followers of a person, to determine if he is popular. Another example consists in enriching the event pattern matching with information from contextual streaming data and other streams.

Future works include enriching RSEP-QL with more CEP operators, e.g., DURING and NOT, and realizing other selection and consumption policies in CEP, e.g., *strict/partition contiguity*, *skip till next match*, and *skip till any match* [1] in RSEP-QL.

Another important aspect of this work is its compatibility with alternative data models. Even though we chose a particular model based on timestamped graphs, one can see that it can be converted, or in some case, extended if necessary, to other similar models. For example, data streams with interval timestamps can be easily incorporated into the event pattern evaluation semantics. Finally, the RSEP-QL model can also be helpful for the RSP community, as it provides the most comprehensive query processing model for RDF streams so far. We plan to align our model to the latest proposals of the W3C RSP group, as well as study how it can be adapted for the different profiles proposed in the RSP abstract model.

## Footnotes

- 1.
- 2.
- 3.
Cf. http://goo.gl/pqUSri (last access: July 7, 2016).

- 4.
\( def \not \in I\cup L\cup B\) denoting the default graph. See [16] for the definitions of

*I*,*L*,*B*. - 5.
Cf. https://www.w3.org/TR/sparql11-query for the whole list.

- 6.
We do not tackle here the case where \(w \in I \cup V\), which is one of our future works.

- 7.
http://tinyurl.com/ekaw2016-195-ext (Hosted by Google Drive).

### References

- 1.Agrawal, J., Diao, Y., Gyllstrom, D., Immerman, N.: Efficient pattern matching over event streams. In: SIGMOD, pp. 147–160 (2008)Google Scholar
- 2.Anicic, D.: Event processing and stream reasoning with ETALIS. Ph.D. thesis, Karlsruhe Institute of Technology (2011)Google Scholar
- 3.Anicic, D., Fodor, P., Rudolph, S., Stojanovic, N.: EP-SPARQL: a unified language for event processing and stream reasoning. In: WWW, pp. 635–644 (2011)Google Scholar
- 4.Arasu, A., Babu, S., Widom, J.: The CQL continuous query language: semantic foundations and query execution. VLDB J.
**15**(2), 121–142 (2006)CrossRefGoogle Scholar - 5.Babcock, B., Babu, S., Datar, M., Motwani, R., Widom, J.: Models and issues in data stream systems. In: PODS, pp. 1–16. ACM (2002)Google Scholar
- 6.Barbieri, D.F., Braga, D., Ceri, S., Della Valle, E., Grossniklaus, M.: C-SPARQL: a continuous query language for RDF data streams. Int. J. Semant. Comput.
**4**(1), 3–25 (2010)CrossRefMATHGoogle Scholar - 7.Beck, H., Dao-Tran, M., Eiter, T., Fink, M.: LARS: a logic-based framework for analyzing reasoning over streams. In: AAAI, pp. 1431–1438 (2015)Google Scholar
- 8.Botan, I., Derakhshan, R., Dindar, N., Haas, L.M., Miller, R.J., Tatbul, N.: SECRET: a model for analysis of the execution semantics of stream processing systems. PVLDB
**3**(1), 232–243 (2010)Google Scholar - 9.Calbimonte, J.P., Jeung, H., Corcho, Ó., Aberer, K.: Enabling query technologies for the semantic sensor web. Int. J. Semant. Web Inf. Syst.
**8**(1), 43–63 (2012)CrossRefGoogle Scholar - 10.Cugola, G., Margara, A.: Processing flows of information: from data stream to complex event processing. ACM Comput. Surv.
**44**(3), 15:1–15:62 (2011)Google Scholar - 11.Dao-Tran, M., Beck, H., Eiter, T.: Contrasting RDF stream processing semantics. In: Qi, G., Kozaki, K., Pan, J.Z., Yu, S. (eds.) JIST 2015. LNCS, vol. 9544, pp. 289–298. Springer, Heidelberg (2016). doi:10.1007/978-3-319-31676-5_21 CrossRefGoogle Scholar
- 12.Dao-Tran, M., Le-Phuoc, D.: Towards enriching CQELS with complex event processing and path navigation. In: HiDeSt, pp. 2–14 (2015)Google Scholar
- 13.Dell’Aglio, D., Balduini, M., Della Valle, E.: On the need to include functional testing in RDF stream engine benchmarks. In: 10th ESWC 2013 Conference Workshops: BeRSys 2013, AImWD 2013 and USEWOD 2013 (2013)Google Scholar
- 14.Dell’Aglio, D., Valle, E.D., Calbimonte, J., Corcho, O.: RSP-QL semantics: a unifying query model to explain heterogeneity of RDF stream processing systems. Int. J. Semant. Web Inf. Syst.
**10**(4), 17–44 (2014)CrossRefGoogle Scholar - 15.Gutierrez, C., Hurtado, C., Vaisman, A.: Introducing time into RDF. IEEE Trans. Knowl. Data Eng.
**19**(2), 207–218 (2007)CrossRefGoogle Scholar - 16.Harris, S., Seaborne, A.: SPARQL 1.1 Query Language (2013). http://www.w3.org/TR/sparql11-query/
- 17.Komazec, S., Cerri, D., Fensel, D.: Sparkwave: continuous schema-enhanced pattern matching over RDF data streams. In: DEBS, pp. 58–68 (2012)Google Scholar
- 18.Luckham, D.C.: The power of events - an introduction to complex event processing in distributed enterprise systems. ACM (2005)Google Scholar
- 19.Le-Phuoc, D., Dao-Tran, M., Xavier Parreira, J., Hauswirth, M.: A native and adaptive approach for unified processing of linked streams and linked data. In: Aroyo, L., Welty, C., Alani, H., Taylor, J., Bernstein, A., Kagal, L., Noy, N., Blomqvist, E. (eds.) ISWC 2011, Part I. LNCS, vol. 7031, pp. 370–388. Springer, Heidelberg (2011)CrossRefGoogle Scholar
- 20.Rinne, M., Törmä, S., Nuutila, E.: SPARQL-based applications for RDF-encoded sensor data. In: SSN, vol. 904, pp. 81–96 (2012)Google Scholar