Advertisement

A Native and Adaptive Approach for Unified Processing of Linked Streams and Linked Data

  • Danh Le-Phuoc
  • Minh Dao-Tran
  • Josiane Xavier Parreira
  • Manfred Hauswirth
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7031)

Abstract

In this paper we address the problem of scalable, native and adaptive query processing over Linked Stream Data integrated with Linked Data. Linked Stream Data consists of data generated by stream sources, e.g., sensors, enriched with semantic descriptions, following the standards proposed for Linked Data. This enables the integration of stream data with Linked Data collections and facilitates a wide range of novel applications. Currently available systems use a “black box” approach which delegates the processing to other engines such as stream/event processing engines and SPARQL query processors by translating to their provided languages. As the experimental results described in this paper show, the need for query translation and data transformation, as well as the lack of full control over the query execution, pose major drawbacks in terms of efficiency. To remedy these drawbacks, we present CQELS (Continuous Query Evaluation over Linked Streams), a native and adaptive query processor for unified query processing over Linked Stream Data and Linked Data. In contrast to the existing systems, CQELS uses a “white box” approach and implements the required query operators natively to avoid the overhead and limitations of closed system regimes. CQELS provides a flexible query execution framework with the query processor dynamically adapting to the changes in the input data. During query execution, it continuously reorders operators according to some heuristics to achieve improved query execution in terms of delay and complexity. Moreover, external disk access on large Linked Data collections is reduced with the use of data encoding and caching of intermediate query results. To demonstrate the efficiency of our approach, we present extensive experimental performance evaluations in terms of query execution time, under varied query types, dataset sizes, and number of parallel queries. These results show that CQELS outperforms related approaches by orders of magnitude.

Keywords

Linked Streams RDF Streams Linked Data stream processing dynamic query planning query optimisation 

References

  1. 1.
    Abadi, D.J., Marcus, A., Madden, S.R., Hollenbach, K.: Scalable semantic web data management using vertical partitioning. In: VLDB 2007, pp. 411–422 (2007)Google Scholar
  2. 2.
    Alani, H., Szomszor, M., Cattuto, C., Van den Broeck, W., Correndo, G., Barrat, A.: Live Social Semantics. In: Bernstein, A., Karger, D.R., Heath, T., Feigenbaum, L., Maynard, D., Motta, E., Thirunarayan, K. (eds.) ISWC 2009. LNCS, vol. 5823, pp. 698–714. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  3. 3.
    Anicic, D., Fodor, P., Rudolph, S., Stojanovic, N.: Ep-sparql: a unified language for event processing and stream reasoning. In: WWW 2011, pp. 635–644 (2011)Google Scholar
  4. 4.
    Arasu, A., Babu, S., Widom, J.: The CQL continuous query language: semantic foundations and query execution. The VLDB Journal 15(2), 121–142 (2006)CrossRefGoogle Scholar
  5. 5.
    Arasu, A., Widom, J.: A denotational semantics for continuous queries over streams and relations. SIGMOD Record 33(3), 6–12 (2004)CrossRefGoogle Scholar
  6. 6.
    Avnur, R., Hellerstein, J.M.: Eddies: continuously adaptive query processing. SIGMOD Rec. 29(2), 261–272 (2000)CrossRefGoogle Scholar
  7. 7.
    Babu, S., Munagala, K., Widom, J., Motwani, R.: Adaptive Caching for Continuous Queries. In: ICDE 2005, pp. 118–129 (2005)Google Scholar
  8. 8.
    Balazinska, M., Deshpande, A., Franklin, M.J., Gibbons, P.B., Gray, J., Hansen, M., Liebhold, M., Nath, S., Szalay, A., Tao, V.: Data Management in the Worldwide Sensor Web. IEEE Pervasive Computing 6(2), 30–40 (2007)CrossRefGoogle Scholar
  9. 9.
    Barbieri, D.F., Braga, D., Ceri, S., Grossniklaus, M.: An execution environment for C-SPARQL queries. In: EDBT 2010, pp. 441–452 (2010)Google Scholar
  10. 10.
    Bizer, C., Heath, T., Berners-Lee, T.: Linked Data - The Story So Far. International Journal on Semantic Web and Information Systems 5(3), 1–22 (2009)CrossRefGoogle Scholar
  11. 11.
    Bolles, A., Grawunder, M., Jacobi, J.: Streaming SPARQL - Extending SPARQL to Process Data Streams. In: Bechhofer, S., Hauswirth, M., Hoffmann, J., Koubarakis, M. (eds.) ESWC 2008. LNCS, vol. 5021, pp. 448–462. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  12. 12.
    Bouillet, E., Feblowitz, M., Liu, Z., Ranganathan, A., Riabov, A., Ye, F.: A Semantics-Based Middleware for Utilizing Heterogeneous Sensor Networks. In: Aspnes, J., Scheideler, C., Arora, A., Madden, S. (eds.) DCOSS 2007. LNCS, vol. 4549, pp. 174–188. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  13. 13.
    Broekstra, J., Kampman, A., van Harmelen, F.: Sesame: An architecture for storing and querying rdf data and schema information (2003)Google Scholar
  14. 14.
    Calbimonte, J.P., Corcho, O., Gray, A.J.G.: Enabling Ontology-Based Access to Streaming Data Sources. In: Patel-Schneider, P.F., Pan, Y., Hitzler, P., Mika, P., Zhang, L., Pan, J.Z., Horrocks, I., Glimm, B. (eds.) ISWC 2010, Part I. LNCS, vol. 6496, pp. 96–111. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  15. 15.
    Carney, D., Çetintemel, U., Cherniack, M., Convey, C., Lee, S., Seidman, G., Stonebraker, M., Tatbul, N., Zdonik, S.: Monitoring streams: a new class of data management applications. In: VLDB 2002, pp. 215–226 (2002)Google Scholar
  16. 16.
    Chong, E.I., Das, S., Eadon, G., Srinivasan, J.: An efficient SQL-based RDF querying scheme. In: VLDB 2005, pp. 1216–1227 (2005)Google Scholar
  17. 17.
    Deshpande, A., Ives, Z., Raman, V.: Adaptive query processing. Found. Trends Databases (January 2007)Google Scholar
  18. 18.
    Elmasri, R., Navathe, S.B.: Fundamentals of Database Systems, 5th edn. Addison-Wesley Longman Publishing Co., Inc., Boston (2006)zbMATHGoogle Scholar
  19. 19.
    Fidge, C.J.: Logical time in distributed computing systems. IEEE Computer 24(8), 28–33 (1991)CrossRefGoogle Scholar
  20. 20.
    Golab, L., Özsu, M.T.: Issues in data stream management. SIGMOD Rec. 32(2), 5–14 (2003)CrossRefGoogle Scholar
  21. 21.
    Gupta, A., Mumick, I.S.: Maintenance of materialized views: problems, techniques, and applications. In: Materialized Views, pp. 145–157 (1999)Google Scholar
  22. 22.
    Gutierrez, C., Hurtado, C.A., Vaisman, A.: Introducing Time into RDF. IEEE Transactions on Knowledge and Data Engineering 19, 207–218 (2007)CrossRefGoogle Scholar
  23. 23.
    Harth, A., Umbrich, J., Hogan, A., Decker, S.: YARS2: A Federated Repository for Querying Graph Structured Data from the Web. In: Aberer, K., Choi, K.-S., Noy, N., Allemang, D., Lee, K.-I., Nixon, L.J.B., Golbeck, J., Mika, P., Maynard, D., Mizoguchi, R., Schreiber, G., Cudré-Mauroux, P. (eds.) ASWC 2007 and ISWC 2007. LNCS, vol. 4825, pp. 211–224. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  24. 24.
    Lamport, L.: Time, clocks, and the ordering of events in a distributed system. Commun. ACM 21(7), 558–565 (1978)CrossRefzbMATHGoogle Scholar
  25. 25.
    Le-Phuoc, D., Parreira, J.X., Hausenblas, M., Hauswirth, M.: Continuous query optimization and evaluation over unified linked stream data and linked open data. Technical report, DERI, 9 (2010)Google Scholar
  26. 26.
    Madden, S., Shah, M., Hellerstein, J.M., Raman, V.: Continuously adaptive continuous queries over streams. In: 2002 ACM SIGMOD International Conference on Management of Data, pp. 49–60 (2002)Google Scholar
  27. 27.
    Mattern, F.: Virtual time and global states of distributed systems. In: Parallel and Distributed Algorithms, pp. 215–226. North-Holland (1989)Google Scholar
  28. 28.
    Neumann, T., Weikum, G.: The RDF-3X engine for scalable management of RDF data. The VLDB Journal 19(1), 91–113 (2010)CrossRefGoogle Scholar
  29. 29.
    Pérez, J., Arenas, M., Gutierrez, C.: Semantics and complexity of SPARQL. ACM Trans. Database Syst. 34(3), 1–45 (2009)CrossRefGoogle Scholar
  30. 30.
    Schmidt, M., Hornung, T., Lausen, G., Pinkel, C.: Sp2bench: A sparql performance benchmark. In: ICDE 2009, pp. 222–233 (2009)Google Scholar
  31. 31.
    Sequeda, J.F., Corcho, O.: Linked stream data: A position paper. In: SSN 2009 (2009)Google Scholar
  32. 32.
    Sheth, A.P., Henson, C.A., Sahoo, S.S.: Semantic Sensor Web. IEEE Internet Computing 12(4), 78–83 (2008)CrossRefGoogle Scholar
  33. 33.
    Srivastava, U., Widom, J.: Flexible time management in data stream systems. In: PODS 2004, pp. 263–274 (2004)Google Scholar
  34. 34.
    Stuckenschmidt, H., Vdovjak, R., Houben, G.-J., Broekstra, J.: Index structures and algorithms for querying distributed rdf repositories. In: WWW, pp. 631–639 (2004)Google Scholar
  35. 35.
    Umbrich, J., Karnstedt, M., Land, S.: Towards understanding the changing web: Mining the dynamics of linked-data sources and entities. In: KDML, Workshop (2010)Google Scholar
  36. 36.
    Viglas, S.D., Naughton, J.F., Burger, J.: Maximizing the output rate of multi-way join queries over streaming information sources. In: VLDB 2003 (2003)Google Scholar
  37. 37.
    Whitehouse, K., Zhao, F., Liu, J.: Semantic Streams: A Framework for Composable Semantic Interpretation of Sensor Data. In: Römer, K., Karl, H., Mattern, F. (eds.) EWSN 2006. LNCS, vol. 3868, pp. 5–20. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  38. 38.
    Wilkinson, K., Sayers, C., Kuno, H.A., Reynolds, D.: Efficient RDF storage and retrieval in Jena2, pp. 35–43 (2003)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Danh Le-Phuoc
    • 1
  • Minh Dao-Tran
    • 2
  • Josiane Xavier Parreira
    • 1
  • Manfred Hauswirth
    • 1
  1. 1.Digital Enterprise Research Institute, National University of IrelandGalwayIreland
  2. 2.Institut für InformationssystemeTechnische Universität WienAustria

Personalised recommendations