Keywords

1 Introduction

Several use cases need updating query results over time, and may thus require the re-execution of entire queries over and over again (i.e., polling). The problem is that polling can be very inefficient when not knowing when the data will change. An additional problem is that many public (even static) sparql query endpoints suffer from low availability [2]. This is partially caused by the unrestricted complexity of sparql queries [5] combined with the public character of sparql endpoints. rdf stream processing engines like c-sparql [1] and cqels [4] offer combined access to dynamic data streams and static background data through continuously executing queries. Because of this continuous querying, the cost for these servers is even higher than with static querying.

In this work we present a client-side rdf stream processing engine based on Triple Pattern Fragments (tpf) [6]. tpf is a low-cost server interface for retrieving triple patterns, this makes it possibly for a client to evaluate any query by breaking it up into several triple patterns and joining them locally. We focus on non-high-frequency dynamic data, for example, information on train delays, which updates in the order of minutes. Because some dynamic data might have a frequency that is too high for clients to efficiently poll. The resulting framework requires the server to annotate its data with a predicted expiration time. Using this expiration time, the client can efficiently determine when to retrieve fresh data. The generic approach in this paper is applied to the use case of public transit route planning. It can be used in various other domains with continuously updating data, such as smart city dashboards, business intelligence, or sensor networks.

2 Query Streamer

Our solution consists of a partial redistribution of query evaluation workload from the server to the client. This requires the client to be able to access the server data so that the query evaluation can be done client-side. There needs to be a distinction between regular static data and continuously updating dynamic data in the server’s dataset. By annotating dynamic data with a time interval or expiration time using a temporal vocabulary [3], the client can detect for how long a certain fact remains valid. The data could however still remain the same after its expiration. When dynamic data expires in time, the client knows that it has to evaluate the query again to fetch the latest version of the data.

We have added an extra layer, which is called the Query Streamer, on top of the tpf client. This query streamer is able to transform a regular sparql query to a separate static and dynamic query. This rewriting is done by exchanging metadata with the server. The Query Streamer continuously evaluates this dynamic query based on the time annotation it can find on the dynamic data. The Query Streamer exploits these time annotations by only initiating new queries when the dynamic facts have expired. Every time results from the dynamic query are finalized, they are combined with the static query to form a materialized static query. This materialized static query will then either be evaluated or its results will be retrieved from a local cache. The combined results of the dynamic query and materialized static query are then continuously returned to the user who initiated the original query.

3 Preliminary Evaluation

In order to analyze the effects of our solution, we set up an experiment to measure the impact of our proposed redistribution of workload between the client and server by simultaneously executing a set of queries against a tpf server using our proposed solution. We repeat this experiment for two state-of-the-art server-side solutions: c-sparql and cqels, in which the clients simply register the query to the respective server engine and get a stream of results.

To test the client and server performance, our experiment consists of one server and ten physical clients. Each of these clients can execute from one to twenty unique concurrent queries derived from the query in Listing 1.1. This results in a series of 10 to 200 concurrent query executions.

figure a
Fig. 1.
figure 1

Average server and client cpu usage for one query stream for c-sparql, cqels and the proposed solution. Our solution effectively moves complexity from the server to the client.

Our solution was implementedFootnote 1 in JavaScript using Node.js to allow for easy communication with the existing tpf client. The testsFootnote 2 were executed on machines having two Hexacore Intel E5645 (2.4 GHz) cpus with 24 gb ram and were running Ubuntu 12.04 lts.

The server performance results from our main experiment can be found in Fig. 1a. This plot shows an increasing cpu usage for c-sparql and cqels for higher numbers of concurrent query executions. On the other hand, our solution never reaches more than 1 % of server cpu usage.

The results for the average cpu usage across the duration of the query evaluation of all clients that send queries to the server in our main experiment can be found in Fig. 1b. The clients that send c-sparql and cqels queries to the server have a client cpu usage of nearly zero percent for the whole duration of the query evaluation. The clients using the client-side query streamer solution that is presented in this work have an initial cpu peak reaching about 80 %, which drops to about 5 % after 4 s. This initial peak is caused by the preprocessing done by our query streamer.

4 Conclusions

In this paper, we presented a solution for doing client-side query evaluation over dynamic data, with the goal of lowering the server load. Our preliminary evaluation shows that for queries of limited complexity with limited dataset sizes, our solution significantly reduces the server load. This makes it possible for the server to handle much more client requests when compared to alternative approaches. This lower server load consequently leads to a higher client load in our experiments. Future research should show how this solution performs for larger datasets and different query types. The movement from query registration at the server as is done by c-sparql and cqels to client-side query evaluation is important for reducing the server load and for being able to publish dynamic data at a low cost.

This low-cost publication of dynamic Linked Data opens up a whole new range of possibilities. Dynamic data that currently requires expensive server infrastructure for its publication to a large number of potential clients or is somehow being rate-limited to avoid server overloading, can now be exposed through a low-cost interface where clients are required to do part of the work. This can be used, for example, for public access to real-time information on public transport scheduling and non-high frequency sensors.