Abstract
Traditional rdf stream processing engines work completely server-side, which contributes to a high server cost. For allowing a large number of concurrent clients to do continuous querying, we extend the low-cost Triple Pattern Fragments (tpf) interface with support for time-sensitive queries. In this poster, we give the overview of a client-side rdf stream processing engine on top of tpf. Our experiments show that our solution significantly lowers the server load while increasing the load on the clients. Preliminary results indicate that our solution moves the complexity of continuously evaluating real-time queries from the server to the client, which makes real-time querying much more scalable for a large amount of concurrent clients when compared to the alternatives.
You have full access to this open access chapter, Download conference paper PDF
Similar content being viewed by others
Keywords
1 Introduction
Several use cases need updating query results over time, and may thus require the re-execution of entire queries over and over again (i.e., polling). The problem is that polling can be very inefficient when not knowing when the data will change. An additional problem is that many public (even static) sparql query endpoints suffer from low availability [2]. This is partially caused by the unrestricted complexity of sparql queries [5] combined with the public character of sparql endpoints. rdf stream processing engines like c-sparql [1] and cqels [4] offer combined access to dynamic data streams and static background data through continuously executing queries. Because of this continuous querying, the cost for these servers is even higher than with static querying.
In this work we present a client-side rdf stream processing engine based on Triple Pattern Fragments (tpf) [6]. tpf is a low-cost server interface for retrieving triple patterns, this makes it possibly for a client to evaluate any query by breaking it up into several triple patterns and joining them locally. We focus on non-high-frequency dynamic data, for example, information on train delays, which updates in the order of minutes. Because some dynamic data might have a frequency that is too high for clients to efficiently poll. The resulting framework requires the server to annotate its data with a predicted expiration time. Using this expiration time, the client can efficiently determine when to retrieve fresh data. The generic approach in this paper is applied to the use case of public transit route planning. It can be used in various other domains with continuously updating data, such as smart city dashboards, business intelligence, or sensor networks.
2 Query Streamer
Our solution consists of a partial redistribution of query evaluation workload from the server to the client. This requires the client to be able to access the server data so that the query evaluation can be done client-side. There needs to be a distinction between regular static data and continuously updating dynamic data in the server’s dataset. By annotating dynamic data with a time interval or expiration time using a temporal vocabulary [3], the client can detect for how long a certain fact remains valid. The data could however still remain the same after its expiration. When dynamic data expires in time, the client knows that it has to evaluate the query again to fetch the latest version of the data.
We have added an extra layer, which is called the Query Streamer, on top of the tpf client. This query streamer is able to transform a regular sparql query to a separate static and dynamic query. This rewriting is done by exchanging metadata with the server. The Query Streamer continuously evaluates this dynamic query based on the time annotation it can find on the dynamic data. The Query Streamer exploits these time annotations by only initiating new queries when the dynamic facts have expired. Every time results from the dynamic query are finalized, they are combined with the static query to form a materialized static query. This materialized static query will then either be evaluated or its results will be retrieved from a local cache. The combined results of the dynamic query and materialized static query are then continuously returned to the user who initiated the original query.
3 Preliminary Evaluation
In order to analyze the effects of our solution, we set up an experiment to measure the impact of our proposed redistribution of workload between the client and server by simultaneously executing a set of queries against a tpf server using our proposed solution. We repeat this experiment for two state-of-the-art server-side solutions: c-sparql and cqels, in which the clients simply register the query to the respective server engine and get a stream of results.
To test the client and server performance, our experiment consists of one server and ten physical clients. Each of these clients can execute from one to twenty unique concurrent queries derived from the query in Listing 1.1. This results in a series of 10 to 200 concurrent query executions.
Our solution was implementedFootnote 1 in JavaScript using Node.js to allow for easy communication with the existing tpf client. The testsFootnote 2 were executed on machines having two Hexacore Intel E5645 (2.4 GHz) cpus with 24 gb ram and were running Ubuntu 12.04 lts.
The server performance results from our main experiment can be found in Fig. 1a. This plot shows an increasing cpu usage for c-sparql and cqels for higher numbers of concurrent query executions. On the other hand, our solution never reaches more than 1 % of server cpu usage.
The results for the average cpu usage across the duration of the query evaluation of all clients that send queries to the server in our main experiment can be found in Fig. 1b. The clients that send c-sparql and cqels queries to the server have a client cpu usage of nearly zero percent for the whole duration of the query evaluation. The clients using the client-side query streamer solution that is presented in this work have an initial cpu peak reaching about 80 %, which drops to about 5 % after 4 s. This initial peak is caused by the preprocessing done by our query streamer.
4 Conclusions
In this paper, we presented a solution for doing client-side query evaluation over dynamic data, with the goal of lowering the server load. Our preliminary evaluation shows that for queries of limited complexity with limited dataset sizes, our solution significantly reduces the server load. This makes it possible for the server to handle much more client requests when compared to alternative approaches. This lower server load consequently leads to a higher client load in our experiments. Future research should show how this solution performs for larger datasets and different query types. The movement from query registration at the server as is done by c-sparql and cqels to client-side query evaluation is important for reducing the server load and for being able to publish dynamic data at a low cost.
This low-cost publication of dynamic Linked Data opens up a whole new range of possibilities. Dynamic data that currently requires expensive server infrastructure for its publication to a large number of potential clients or is somehow being rate-limited to avoid server overloading, can now be exposed through a low-cost interface where clients are required to do part of the work. This can be used, for example, for public access to real-time information on public transport scheduling and non-high frequency sensors.
Notes
- 1.
The source code for this implementation is available at https://github.com/rubensworks/TPFStreamingQueryExecutor.
- 2.
The code used to run these experiments with the relevant queries can be found at https://github.com/rubensworks/TPFStreamingQueryExecutor-experiments/.
References
Barbieri, D.F., Braga, D., Ceri, S., Valle, E.D., Grossniklaus, M.: Querying rdf streams with c-sparql. SIGMOD Rec. 39(1), 20–26 (2010)
Buil-Aranda, C., Hogan, A., Umbrich, J., Vandenbussche, P.-Y.: SPARQL web-querying infrastructure: ready for action? In: Alani, H., Kagal, L., Fokoue, A., Groth, P., Biemann, C., Parreira, J.X., Aroyo, L., Noy, N., Welty, C., Janowicz, K. (eds.) ISWC 2013. LNCS, vol. 8219, pp. 277–293. Springer, Heidelberg (2013). doi:10.1007/978-3-642-41338-4_18
Gutierrez, C., Hurtado, C., Vaisman, A.: Introducing time into rdf. IEEE Trans. Knowl. Data Eng. 19(2), 207–218 (2007)
Le-Phuoc, D., Dao-Tran, M., Xavier Parreira, J., Hauswirth, M.: A native and adaptive approach for unified processing of linked streams and Linked Data. In: Aroyo, L., Welty, C., Alani, H., Taylor, J., Bernstein, A., Kagal, L., Noy, N., Blomqvist, E. (eds.) ISWC 2011, Part I. LNCS, vol. 7031, pp. 370–388. Springer, Heidelberg (2011). doi:10.1007/978-3-642-25073-6_24
Pérez, J., Arenas, M., Gutierrez, C.: Semantics and complexity of SPARQL. In: Cruz, I., Decker, S., Allemang, D., Preist, C., Schwabe, D., Mika, P., Uschold, M., Aroyo, L.M. (eds.) ISWC 2006. LNCS, vol. 4273, pp. 30–43. Springer, Heidelberg (2006). doi:10.1007/11926078_3
Verborgh, R., et al.: Querying datasets on the Web with high availability. In: Mika, P., et al. (eds.) ISWC 2014, Part I. LNCS, vol. 8796, pp. 180–196. Springer, Heidelberg (2014). doi:10.1007/978-3-319-11964-9_12
Acknowledgments
The described research activities were funded by iMinds and Ghent University, the Institute for the Promotion of Innovation by Science and Technology in Flanders (IWT), the Fund for Scientific Research Flanders (FWO Flanders), and the European Union. Ruben Verborgh is a Postdoctoral Fellow of the Research Foundation Flanders.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing AG
About this paper
Cite this paper
Taelman, R., Verborgh, R., Colpaert, P., Mannens, E. (2016). Moving Real-Time Linked Data Query Evaluation to the Client. In: Sack, H., Rizzo, G., Steinmetz, N., Mladenić, D., Auer, S., Lange, C. (eds) The Semantic Web. ESWC 2016. Lecture Notes in Computer Science(), vol 9989. Springer, Cham. https://doi.org/10.1007/978-3-319-47602-5_1
Download citation
DOI: https://doi.org/10.1007/978-3-319-47602-5_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-47601-8
Online ISBN: 978-3-319-47602-5
eBook Packages: Computer ScienceComputer Science (R0)