Advertisement

Streaming ETL in Polystore Era

  • Nabila Berkani
  • Ladjel Bellatreche
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11336)

Abstract

In today’s digital environment, businesses have to access, store and analyze in a real time fashion vast amounts of data issued from streaming graph-structure data sources. To meet these requirements, companies owning the data warehouse (\(\mathcal {DW}\)) technology have to combine hardware and software solutions to reduce the time latency between a \(\mathcal {DW}\) and its data sources. The explosion of advanced hardware deployment platforms such as polystore represents an opportunity as pointed in recent studies. But, deploying a graph-structure \(\mathcal {DW}\) over a polystore is not a simple task, since it requires two important phases which are data partitioning and allocation. We claim that these phases have to be connected to the ETL (Extract, Transform, Load) phase, especially its loading process. This connection questions the initial schedule of ETL and deployment processes. In this paper, we present a new approach that connects ETL and deployment processes and challenges their traditional scheduling to meet real time analysis requirements.

Keywords

RDF Fragmentation Allocation ETL Polystore 

References

  1. 1.
    Berkani, N., Bellatreche, L.: A variety-sensitive ETL processes. In: Benslimane, D., Damiani, E., Grosky, W.I., Hameurlain, A., Sheth, A., Wagner, R.R. (eds.) DEXA 2017. LNCS, vol. 10439, pp. 201–216. Springer, Cham (2017).  https://doi.org/10.1007/978-3-319-64471-4_17CrossRefGoogle Scholar
  2. 2.
    Berkani, N., Bellatreche, L., Benatallah, B.: A value-added approach to design BI applications. In: Madria, S., Hara, T. (eds.) DaWaK 2016. LNCS, vol. 9829, pp. 361–375. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-43946-4_24CrossRefGoogle Scholar
  3. 3.
    Berkani, N., Bellatreche, L., Ordonez, C.: ETL-aware materialized view selection in semantic data streamwarehouses. In: RCIS. IEEE (2018)Google Scholar
  4. 4.
    Bondiombouy, C., Valduriez, P.: Query processing in multistore systems: an overview. IJCC 5(4), 309–346 (2016)CrossRefGoogle Scholar
  5. 5.
    Bornea, M.A., Deligiannakis, A., Kotidis, Y., Vassalos, V.: Semi-streamed index join for near-real time execution of ETL transformations. In: ICDE, pp. 159–170 (2011)Google Scholar
  6. 6.
    Boukorca, A., Bellatreche, L., Cuzzocrea, A.: SLEMAS: an approach for selecting MV under query scheduling constraints. In: COMAD, pp. 66–73 (2014)Google Scholar
  7. 7.
    Duggan, J., et al.: The bigdawg polystore system. ACM Sigmod Rec. 44(2), 11–16 (2015)CrossRefGoogle Scholar
  8. 8.
    Galárraga, L., Hose, K., Schenkel, R.: Partout: a distributed engine for efficient RDF processing. In: WWW, pp. 267–268. ACM (2014)Google Scholar
  9. 9.
    Hose, K., Schenkel, R.: WARP: workload-aware replication and partitioning for RDF. In: ICDE Workshops, pp. 1–6 (2013)Google Scholar
  10. 10.
    Huang, J., Abadi, D.J., Ren, K.: Scalable SPARQL querying of large RDF graphs. PVLDB 4(11), 1123–1134 (2011)Google Scholar
  11. 11.
    Jörg, T., Deßloch, S.: Towards generating ETL processes for incremental loading. In: IDEAS, pp. 101–110 (2008)Google Scholar
  12. 12.
    Jörg, T., Dessloch, S.: Formalizing ETL jobs for incremental loading of data warehouses. In: BTW, pp. 327–346 (2009)Google Scholar
  13. 13.
    Jörg, T., Dessloch, S.: Near real-time data warehousing using state-of-the-art ETL tools. In: Castellanos, M., Dayal, U., Miller, R.J. (eds.) BIRTE 2009. LNBIP, vol. 41, pp. 100–117. Springer, Heidelberg (2010).  https://doi.org/10.1007/978-3-642-14559-9_7CrossRefGoogle Scholar
  14. 14.
    Karakasidis, A., Vassiliadis, P., Pitoura, E.: ETL queues for active data warehousing. In: IQIS, pp. 28–39 (2005)Google Scholar
  15. 15.
    Karypis, G., Kumar, V.: Multilevel k-way hypergraph partitioning. In: DAC, pp. 343–348 (1999)Google Scholar
  16. 16.
    Le, W., Kementsietsidis, A., Duan, S., Li, F.: Scalable multi-query optimization for SPARQL. In: ICDE, pp. 666–677 (2012)Google Scholar
  17. 17.
    Lee, K., Liu, L.: Scaling queries over big RDF graphs with semantic hash partitioning. Proc. VLDB Endow. 6(14), 1894–1905 (2013)CrossRefGoogle Scholar
  18. 18.
    Mayer, R., Mayer, C., Tariq, M.A., Rothermel, K.: Graphcep: real-time data analytics using parallel complex event and graph processing. In: DEBS, pp. 309–316 (2016)Google Scholar
  19. 19.
    Meehan, J., Aslantas, C., Zdonik, S., Tatbul, N., Du, J.: Data ingestion for the connected world. In: CIDR (2017)Google Scholar
  20. 20.
    Ordonez, C., Johnson, T., Urbanek, S., Shkapenyuk, V., Srivastava, D.: Integrating the R language runtime system with a data stream warehouse. In: Benslimane, D., Damiani, E., Grosky, W.I., Hameurlain, A., Sheth, A., Wagner, R.R. (eds.) DEXA 2017. LNCS, vol. 10439, pp. 217–231. Springer, Cham (2017).  https://doi.org/10.1007/978-3-319-64471-4_18CrossRefGoogle Scholar
  21. 21.
    Ozsu, M.T., Valduriez, P.: Principles of Distributed Database Systems, 3rd edn. Springer, Heidelberg (2011).  https://doi.org/10.1007/978-1-4419-8834-8CrossRefGoogle Scholar
  22. 22.
    Peng, P., Zou, L., Chen, L., Zhao, D.: Query workload-based RDF graph fragmentation and allocation. In: EDBT, pp. 377–388 (2016)Google Scholar
  23. 23.
    Ram, P., Do, L.: Extracting delta for incremental data warehouse maintenance. In: ICDE, pp. 220–229 (2000)Google Scholar
  24. 24.
    Simitsis, A., Wilkinson, K., Dayal, U., Castellanos, M.: Optimizing ETL workflows for fault-tolerance. In: ICDE, pp. 385–396 (2010)Google Scholar
  25. 25.
    Skoutas, D., Simitsis, A.: Ontology-based conceptual design of ETL processes for both structured and semi-structured data. Seman. Web 3(4), 1–24 (2007)Google Scholar
  26. 26.
    Suchanek, F.M., Kasneci, G., Weikum, G.: Yago: a core of semantic knowledge. In: WWW, pp. 697–706 (2007)Google Scholar
  27. 27.
    Vassiliadis, P., Simitsis, A.: Near real time ETL. In: Vassiliadis, P., Simitsis, A., et al. (eds.) New Trends in Data Warehousing and Data Analysis, pp. 1–31. Springer, Heidelberg (2009).  https://doi.org/10.1007/978-0-387-87431-9CrossRefGoogle Scholar
  28. 28.
    Waas, F., Wrembel, R., Freudenreich, T., Thiele, M., Koncilia, C., Furtado, P.: On-demand ELT architecture for right-time BI: extending the vision. IJDWM 9(2), 21–38 (2013)Google Scholar
  29. 29.
    Wu, B., Zhou, Y., Yuan, P., Liu, L., Jin, H.: Scalable SPARQL querying using path partitioning. In: ICDE, pp. 795–806 (2015)Google Scholar
  30. 30.
    Zeng, K., Yang, J., Wang, H., Shao, B., Wang, Z.: A distributed graph engine for web scale RDF data. PVLDB 6(4), 265–276 (2013)Google Scholar
  31. 31.
    Zhu, M., Risch, T.: Querying combined cloud-based and relational databases. In: Cloud and Service Computing (CSC), pp. 330–335. IEEE (2011)Google Scholar
  32. 32.
    Zhu, Y., An, L., Liu, S.: Data updating and query in real-time data warehouse system. In: CSSE, vol. 5, pp. 1295–1297 (2008)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  1. 1.Ecole Nationale Supérieure d’InformatiqueOued-Smar, AlgerAlgeria
  2. 2.LIAS/ISAE-ENSMAPoitiersFrance

Personalised recommendations