Planning Ahead: Stream-Driven Linked-Data Access Under Update-Budget Constraints

  • Shen Gao
  • Daniele Dell’Aglio
  • Soheila Dehghanzadeh
  • Abraham Bernstein
  • Emanuele Della Valle
  • Alessandra Mileo
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9981)

Abstract

Data stream applications are becoming increasingly popular on the web. In these applications, one query pattern is especially prominent: a join between a continuous data stream and some background data (BGD). Oftentimes, the target BGD is large, maintained externally, changing slowly, and costly to query (both in terms of time and money). Hence, practical applications usually maintain a local (cached) view of the relevant BGD. Given that these caches are not updated as the original BGD, they should be refreshed under realistic budget constraints (in terms of latency, computation time, and possibly financial cost) to avoid stale data leading to wrong answers. This paper proposes to model the join between streams and the BGD as a bipartite graph. By exploiting the graph structure, we keep the quality of results good enough without refreshing the entire cache for each evaluation. We also introduce two extensions to this method: first, we consider a continuous join between recent portions of a data stream and some BGD to focus on updates that have the longest effect. Second, we consider the future impact of a query to the BGD by proposing to delay some updates to provide fresher answers in future. By extending an existing stream processor with the proposed policies, we empirically show that we can improve result freshness by 93 % over baseline algorithms such as Random Selection or Least Recently Updated.

References

  1. 1.
    Abadi, D.J.: Consistency tradeoffs in modern distributed database system design: cap is only part of the story. Computer 2, 37–42 (2012)CrossRefGoogle Scholar
  2. 2.
    Aranda, C.B., Arenas, M., Corcho, Ó., Polleres, A.: Federating queries in SPARQL 1.1: syntax, semantics and evaluation. J. Web Semant. 18(1), 1–17 (2013)CrossRefGoogle Scholar
  3. 3.
    Barbieri, D.F., Braga, D., Ceri, S., Della Valle, E., Grossniklaus, M.: Querying RDF streams with C-SPARQL. SIGMOD Rec. 39(1), 20–26 (2010)CrossRefMATHGoogle Scholar
  4. 4.
    Calbimonte, J., Jeung, H., Corcho, Ó., Aberer, K.: Enabling query technologies for the semantic sensor web. Int. J. Semant. Web Inf. Syst. 8(1), 43–63 (2012)CrossRefGoogle Scholar
  5. 5.
    Dehghanzadeh, S., Dell’Aglio, D., Gao, S., Della Valle, E., Mileo, A., Bernstein, A.: Approximate continuous query answering over streams and dynamic linked data sets. In: Cimiano, P., Frasincar, F., Houben, G.-J., Schwabe, D. (eds.) ICWE 2015. LNCS, vol. 9114, pp. 307–325. Springer, Heidelberg (2015)CrossRefGoogle Scholar
  6. 6.
    Dehghanzadeh, S., Parreira, J.X., Karnstedt, M., Umbrich, J., Hauswirth, M., Decker, S.: Optimizing SPARQL query processing on dynamic and static data based on query time/freshness requirements using materialization. In: Supnithi, T., Yamaguchi, T., Pan, J.Z., Wuwongse, V., Buranarach, M. (eds.) JIST 2014. LNCS, vol. 8943, pp. 257–270. Springer, Heidelberg (2015)CrossRefGoogle Scholar
  7. 7.
    Dell’Aglio, D., Della Valle, E., Calbimonte, J., Corcho, Ó.: RSP-QL semantics: a unifying query model to explain heterogeneity of RDF stream processing systems. Int. J. Semant. Web Inf. Syst. 10(4), 17–44 (2014)CrossRefGoogle Scholar
  8. 8.
    Gançarski, S., Naacke, H., Pacitti, E., Valduriez, P.: The leganet system: freshness-aware transaction routing in a database cluster. Inf. Syst. 32, 320–343 (2007)CrossRefGoogle Scholar
  9. 9.
    Guo, H., Larson, P.-Å., Ramakrishnan, R.: Caching with good enough currency, consistency, and completeness. In: VLDB, pp. 457–468. VLDB Endowment (2005)Google Scholar
  10. 10.
    Hasan, S., O’Riain, S., Curry, E.: Towards unified and native enrichment in event processing systems. In: DEBS, pp. 171–182. ACM (2013)Google Scholar
  11. 11.
    Hinze, A., Sachs, K., Buchmann, A.: Event-based applications and enabling technologies. In: DEBS, p. 1. ACM (2009)Google Scholar
  12. 12.
    Ji, Y., Jerzak, Z., Nica, A., Hackenbroich, G., Fetzer, C.: Optimization of continuous queries in federated database and stream processing systems. In: BTW 2015. LNI, vol. 241, pp. 403–422. GI (2015)Google Scholar
  13. 13.
    Käfer, T., Umbrich, J., Hogan, A., Polleres, A.: Towards a dynamic linked data observatory. LDOW at WWW (2012)Google Scholar
  14. 14.
    Labrinidis, A., Roussopoulos, N.: Exploring the tradeoff between performance and data freshness in database-driven web servers. VLDB J. 13(3), 240–255 (2004)CrossRefGoogle Scholar
  15. 15.
    Ladwig, G., Tran, T.: SIHJoin: querying remote and local linked data. In: Antoniou, G., Grobelnik, M., Simperl, E., Parsia, B., Plexousakis, D., De Leenheer, P., Pan, J. (eds.) ESWC 2011, Part I. LNCS, vol. 6643, pp. 139–153. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  16. 16.
    Le-Phuoc, D., Dao-Tran, M., Xavier Parreira, J., Hauswirth, M.: A native and adaptive approach for unified processing of linked streams and linked data. In: Aroyo, L., Welty, C., Alani, H., Taylor, J., Bernstein, A., Kagal, L., Noy, N., Blomqvist, E. (eds.) ISWC 2011, Part I. LNCS, vol. 7031, pp. 370–388. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  17. 17.
    Lee, R., Xu, Z.: Exploiting stream request locality to improve query throughput of a data integration system. IEEE Trans. Comput. 58(10), 1356–1368 (2009)MathSciNetCrossRefGoogle Scholar
  18. 18.
    Margara, A., Urbani, J., van Harmelen, F., Bal, H.: Streaming the web: reasoning over dynamic data. J. Web Semant. 25, 24–44 (2014)CrossRefGoogle Scholar
  19. 19.
    Montoya, G., Vidal, M.-E., Corcho, O., Ruckhaus, E., Buil-Aranda, C.: Benchmarking federated SPARQL query engines: are existing testbeds enough? In: Cudré-Mauroux, P., et al. (eds.) ISWC 2012, Part II. LNCS, vol. 7650, pp. 313–324. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  20. 20.
    Rinne, M., Solanki, M., Nuutila, E.: RFID-based logistics monitoring with semantics-driven event processing. In: DEBS, pp. 238–245 (2016)Google Scholar
  21. 21.
    Sharaf, M., Chrysanthis, P., Labrinidis, A.: Preemptive rate-based operator scheduling in a data stream management system. In: AICCSA, pp. 46–59 (2005)Google Scholar
  22. 22.
    Teymourian, K., Paschke, A.: Plan-based semantic enrichment of event streams. In: Presutti, V., d’Amato, C., Gandon, F., d’Aquin, M., Staab, S., Tordai, A. (eds.) ESWC 2014. LNCS, vol. 8465, pp. 21–35. Springer, Heidelberg (2014)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2016

Authors and Affiliations

  • Shen Gao
    • 1
  • Daniele Dell’Aglio
    • 1
    • 2
  • Soheila Dehghanzadeh
    • 3
  • Abraham Bernstein
    • 1
  • Emanuele Della Valle
    • 2
  • Alessandra Mileo
    • 3
  1. 1.Department of InformaticsUniversity of ZurichZurichSwitzerland
  2. 2.DEIBPolitecnico di MilanoMilanoItaly
  3. 3.INSIGHT Research CenterNUI GalwayGalwayIreland

Personalised recommendations