Elastic and Scalable Processing of Linked Stream Data in the Cloud

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8218)


Linked Stream Data extends the Linked Data paradigm to dynamic data sources. It enables the integration and joint processing of heterogeneous stream data with quasi-static data from the Linked Data Cloud in near-real-time. Several Linked Stream Data processing engines exist but their scalability still needs to be in improved in terms of (static and dynamic) data sizes, number of concurrent queries, stream update frequencies, etc. So far, none of them supports parallel processing in the Cloud, i.e., elastic load profiles in a hosted environment. To remedy these limitations, this paper presents an approach for elastically parallelizing the continuous execution of queries over Linked Stream Data. For this, we have developed novel, highly efficient, and scalable parallel algorithms for continuous query operators. Our approach and algorithms are implemented in our CQELS Cloud system and we present extensive evaluations of their superior performance on Amazon EC2 demonstrating their high scalability and excellent elasticity in a real deployment.


Cloud Linked Data linked stream processing continuous queries 


  1. 1.
    Anicic, D., Fodor, P.: EP-SPARQL: a unified language for event processing and stream reasoning. In: WWW. ACM, New York (2011)Google Scholar
  2. 2.
    Avnur, R., Hellerstein, J.M.: Eddies: continuously adaptive query processing. SIGMOD Rec. 29, 261–272 (2000)CrossRefGoogle Scholar
  3. 3.
    Barbieri, D.F., Braga, D., Ceri, S., Grossniklaus, M.: An execution environment for C-SPARQL queries. In: EDBT 2010. ACM, New York (2010)Google Scholar
  4. 4.
    Bolles, A., Grawunder, M., Jacobi, J.: Streaming SPARQL – extending SPARQL to process data streams. In: Bechhofer, S., Hauswirth, M., Hoffmann, J., Koubarakis, M. (eds.) ESWC 2008. LNCS, vol. 5021, pp. 448–462. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  5. 5.
    Calbimonte, J.-P., Corcho, O., Gray, A.J.G.: Enabling ontology-based access to streaming data sources. In: Patel-Schneider, P.F., Pan, Y., Hitzler, P., Mika, P., Zhang, L., Pan, J.Z., Horrocks, I., Glimm, B. (eds.) ISWC 2010, Part I. LNCS, vol. 6496, pp. 96–111. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  6. 6.
    Daniel, Y.A., Abadi, J.: The Design of the Borealis Stream Processing Engine. In: CIDR 2005, pp. 277–289 (2005)Google Scholar
  7. 7.
    Dell’Aglio, D., Calbimonte, J.-P., Balduini, M., Corcho, O., Della Valle, E.: On correctness in RDF stream processor benchmarking. In: Alani, H., et al. (eds.) ISWC 2013, Part II. LNCS, vol. 8219, pp. 321–336. Springer, Heidelberg (2013)Google Scholar
  8. 8.
    Deshpande, A., Ives, Z., Raman, V.: Adaptive query processing. In: Foundations and Trends in Databases, vol. 1 (January 2007)Google Scholar
  9. 9.
    Elias, P.: Universal codeword sets and representations of the integers. IEEE Trans. Inf. Theor. 21(2), 194–203 (2006)MathSciNetCrossRefGoogle Scholar
  10. 10.
    Ghanem, T., Hammad, M., Mokbel, M., Aref, W., Elmagarmid, A.: Incremental Evaluation of Sliding-Window Queries over Data Streams. TKDE 19(1) (2007)Google Scholar
  11. 11.
    Golab, L., Özsu, M.T.: Data Stream Management. Synthesis Lectures on Data Management. Morgan & Claypool Publishers (2010)Google Scholar
  12. 12.
    Gulisano, V., Jimenez-Peris, R., Patino-Martinez, M., Valduriez, P.: Streamcloud: A large scale data streaming system. In: ICDCS (2010)Google Scholar
  13. 13.
    Hammad, M., Aref, W.G., Franklin, M.J., Mokbel, M.F., Elmagarmid, A.K.: Efficient execution of sliding-window queries over data streams. Technical Report 03-035, Purdue University, Dept. of Computer Science (2003)Google Scholar
  14. 14.
    Hammad, M.A., Franklin, M.J., Aref, W.G., Elmagarmid, A.K.: Scheduling for shared window joins over data streams. In: VLDB. VLDB Endowment (2003)Google Scholar
  15. 15.
    Hoeksema, J., Kotoulas, S.: High-performance Distributed Stream Reasoning using S4. In: 1st International Workshop on Ordering and Reasoning, ISWC (2011)Google Scholar
  16. 16.
    Hunt, P., Konar, M., Junqueira, F.P., Reed, B.: Zookeeper: wait-free coordination for internet-scale systems. In: USENIX (2010)Google Scholar
  17. 17.
    Jacobs, A.: The pathologies of big data. Queue 7(6), 10:10–10:19 (2009)Google Scholar
  18. 18.
    Le Phuoc, D.: A Native And Adaptive Approach for Linked Stream Processing. PhD thesis, National University of Ireland, Galway (2013)Google Scholar
  19. 19.
    Le-Phuoc, D., Dao-Tran, M., Xavier Parreira, J., Hauswirth, M.: A native and adaptive approach for unified processing of linked streams and linked data. In: Aroyo, L., Welty, C., Alani, H., Taylor, J., Bernstein, A., Kagal, L., Noy, N., Blomqvist, E. (eds.) ISWC 2011, Part I. LNCS, vol. 7031, pp. 370–388. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  20. 20.
    Le-Phuoc, D., Dao-Tran, M., Pham, M.-D., Boncz, P., Eiter, T., Fink, M.: Linked Stream Data Processing Engines: Facts and Figures. In: Cudré-Mauroux, P., Heflin, J., Sirin, E., Tudorache, T., Euzenat, J., Hauswirth, M., Parreira, J.X., Hendler, J., Schreiber, G., Bernstein, A., Blomqvist, E. (eds.) ISWC 2012, Part II. LNCS, vol. 7650, pp. 300–312. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  21. 21.
    Lemire, D., Boytsov, L.: Decoding billions of integers per second through vectorization. CoRR, abs/1209.2137 (2012)Google Scholar
  22. 22.
    Manyika, J., Chui, M., Brown, B., Bughin, J., Dobbs, R., Roxburgh, C., Byers, A.H.: Big data: The next frontier for innovation, competition, and productivity. Technical report, McKinsey Global Institute (June 2011)Google Scholar
  23. 23.
    Naughton, V.J.F., Burger, J.: Maximizing the output rate of multi-way join queries over streaming information sources. In: VLDB. VLDB Endowment (2003)Google Scholar
  24. 24.
    Srivastava, U., Widom, J.: Flexible time management in data stream systems. In: ACM SIGMOD-SIGACT-SIGART. ACM, New York (2004)Google Scholar
  25. 25.
    Zhang, Y., Duc, P.M., Corcho, O., Calbimonte, J.-P.: SRBench: A Streaming RDF/SPARQL Benchmark. In: Cudré-Mauroux, P., Heflin, J., Sirin, E., Tudorache, T., Euzenat, J., Hauswirth, M., Parreira, J.X., Hendler, J., Schreiber, G., Bernstein, A., Blomqvist, E. (eds.) ISWC 2012, Part I. LNCS, vol. 7649, pp. 641–657. Springer, Heidelberg (2012)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

There are no affiliations available

Personalised recommendations