Skip to main content

Research on the Stream ETL Process

  • Conference paper
Beyond Databases, Architectures, and Structures (BDAS 2014)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 424))

Abstract

Continuously growing importance of information assisted by rapid development of systems that collect and process huge volumes of data has become a great problem in terms of processing and analyzing data. The response to current and future needs of market is a data warehouse assisted by process of data extraction. Mentioned stream ETL process enables loading real-time data without interrupting processing or conducting analysis that supports decision-making processes. This paper presents first implementation of the stream ETL process which origins from model and concept of a Stream Data Warehouse. In the first part of this paper the concept of the Stream Data Warehouse and its major components, including stream ETL, will be presented. The second part contains description of a developed stream ETL engine, as well as results of performed accuracy and efficiency analysis. Finally, paper concludes with description of future research issues that will be addressed in further research on the presented solution.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Albrecht, A., Naumann, F.: Managing ETL processes. In: Proceedings of the International Workshop on New Trends in Information Integration, NTII 2008, Auckland, New Zealand, August 23, pp. 12–15 (2008)

    Google Scholar 

  2. Athanassoulis, M., Chen, S., Ailamaki, A., Gibbons, P.B., Stoica, R.: MaSM: efficient online updates in data warehouses. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2011, Athens, Greece, June 12-16, pp. 865–876. ACM (2011)

    Google Scholar 

  3. Bergamaschi, S., Guerra, F., Orsini, M., Sartori, C., Vincini, M.: A semantic approach to ETL technologies. Data and Knowledge Engineering 70(8), 717–731 (2011)

    Article  Google Scholar 

  4. Berkani, N., Bellatreche, L., Khouri, S.: Towards a conceptualization of ETL and physical storage of semantic data warehouses as a service. Cluster Computing 16(4), 915–931 (2013)

    Article  Google Scholar 

  5. Gorawski, M., Morzy, T., Wrembel, R., Zgrzywa, A.: Advanced data proceedings and analysis techniques. Control and Cybernetics 40, 581–583 (2012)

    Google Scholar 

  6. Gorawski, M.: Architecture of parallel spatial data warehouse: Balancing algorithm and resumption of data extraction. In: Software Engineering: Evolution and Emerging Technologies. Frontiers in Artificial Intelligence and Applications, vol. 130, pp. 49–59. IOS Press (2005)

    Google Scholar 

  7. Gorawski, M.: Advanced data warehouses. Habilitation. Studia Informatica 30(3B), 386 (2009)

    Google Scholar 

  8. Gorawski, M.: Multiversion spatio-temporal telemetric data warehouse. In: Grundspenkis, J., Kirikova, M., Manolopoulos, Y., Novickis, L. (eds.) ADBIS 2009. LNCS, vol. 5968, pp. 63–70. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  9. Gorawski, M., Bańkowski, S., Gorawski, M.: Selection of structures with grid optimization, in multiagent data warehouse. In: Fyfe, C., Tino, P., Charles, D., Garcia-Osorio, C., Yin, H. (eds.) IDEAL 2010. LNCS, vol. 6283, pp. 292–299. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  10. Gorawski, M., Chrószcz, A.: The design of stream database engine in concurrent environment. In: Meersman, R., Dillon, T., Herrero, P. (eds.) OTM 2009, Part II. LNCS, vol. 5871, pp. 1033–1049. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  11. Gorawski, M., Chrószcz, A.: Query processing using negative and temporal tuples in stream query engines. In: Szmuc, T., Szpyrka, M., Zendulka, J. (eds.) CEE-SET 2009. LNCS, vol. 7054, pp. 70–83. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  12. Gorawski, M., Chrószcz, A.: StreamAPAS: Query language and data model. In: 2009 International Conference on Complex, Intelligent and Software Intensive Systems, CISIS 2009, Fukuoka, Japan, March 16-19, pp. 75–82. IEEE Computer Society (2009)

    Google Scholar 

  13. Gorawski, M., Chrószcz, A.: Optimization of operator partitions in stream data warehouse. In: DOLAP 2011, Proceedings of the ACM 14th International Workshop on Data Warehousing and OLAP, Glasgow, United Kingdom, October 28, pp. 61–66. ACM (2011)

    Google Scholar 

  14. Gorawski, M., Chrószcz, A.: Synchronization modeling in stream processing. In: Morzy, T., Härder, T., Wrembel, R. (eds.) Advances in Databases and Information Systems. AISC, vol. 186, pp. 91–102. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  15. Gorawski, M., Chrószcz, A., Gorawska, A.: Customer unification in E-commerce. In: Yin, H., Tang, K., Gao, Y., Klawonn, F., Lee, M., Weise, T., Li, B., Yao, X. (eds.) IDEAL 2013. LNCS, vol. 8206, pp. 142–152. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  16. Gorawski, M., Gorawska, A.: AGKPStream a operatory strumieniowe. Studia Informatica 33(2A), 181–195 (2012)

    Google Scholar 

  17. Gorawski, M., Gorawska, A.: Stream join operators. In: 10th Students Science Conference Man-Civilization-Future. Oficyna Wydawnicza Politechniki Wroclawskiej (2012)

    Google Scholar 

  18. Gorawski, M., Gorawska, A., Pasterak, K.: Evaluation and development perspectives of stream data processing systems. In: Kwiecień, A., Gaj, P., Stera, P. (eds.) CN 2013. CCIS, vol. 370, pp. 300–311. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  19. Gorawski, M., Lorek, M., Gorawska, A.: CUDA powered user-defined types and aggregates. In: 27th International Conference on Advanced Information Networking and Applications Workshops, WAINA 2013, Barcelona, Spain, March 25-28, pp. 1423–1428. IEEE Computer Society (2013)

    Google Scholar 

  20. Gorawski, M., Malczok, R.: On efficient storing and processing of long aggregate lists. In: Tjoa, A.M., Trujillo, J. (eds.) DaWaK 2005. LNCS, vol. 3589, pp. 190–199. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  21. Gorawski, M., Malczok, R.: Indexing spatial objects in stream data warehouse. In: Nguyen, N.T., Katarzyniak, R., Chen, S.-M. (eds.) Advances in Intelligent Information and Database Systems. SCI, vol. 283, pp. 53–65. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  22. Gorawski, M., Marks, P.: Influence of balancing used in a distributed data warehouse on the extraction process. In: Draheim, D., Weber, G. (eds.) TEAA 2005. LNCS, vol. 3888, pp. 84–98. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  23. Gorawski, M., Marks, P.: Resumption of data extraction process in parallel data warehouses. In: Wyrzykowski, R., Dongarra, J., Meyer, N., Waśniewski, J. (eds.) PPAM 2005. LNCS, vol. 3911, pp. 478–485. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  24. Gorawski, M., Marks, P.: Checkpoint-based resumption in data warehouses. In: Sacha, K. (ed.) Software Engineering Techniques; Design for Quality. IFIP, vol. 227, pp. 313–323. Springer, Boston (2006)

    Chapter  Google Scholar 

  25. Gorawski, M., Marks, P.: Fault-tolerant distributed stream processing system. In: 17th International Workshop on Database and Expert Systems Applications (DEXA 2006), Krakow, Poland, September 4-8, pp. 395–399. IEEE Computer Society (2006)

    Google Scholar 

  26. Gorawski, M., Marks, P., Gorawski, M.: Collecting data streams from a distributed radio-based measurement system. In: Haritsa, J.R., Kotagiri, R., Pudi, V. (eds.) DASFAA 2008. LNCS, vol. 4947, pp. 702–705. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  27. Gorawski, M., Pasterak, K.: Schedulery strumieniowe w AGKPStream. Studia Informatica 33(2A), 197–210 (2012)

    Google Scholar 

  28. Henschen, D.: 2013 analytics and info management trends. Information Week Report ID: R6061112 (2013)

    Google Scholar 

  29. Jörg, T., Deßloch, S.: Towards generating ETL processes for incremental loading. In: Proceedings of the 2008 International Symposium on Database Engineering & Applications. ACM International Conference Proceeding Series, vol. 299, pp. 101–110. ACM (2008)

    Google Scholar 

  30. Kakish, K., Kraft, T.A.: ETL evolution for real-time data warehousing. In: Proceedings of the Conference on Information Systems Applied Research (2012) ISSN 2167-1508

    Google Scholar 

  31. Mathioudakis, M., Koudas, N.: Twittermonitor: trend detection over the twitter stream. In: Proceedings of the 2010 International Conference on Management of Data, pp. 1155–1158. ACM (2010)

    Google Scholar 

  32. Vassiliadis, P.: A survey of extract-transform-load technology. International Journal of Data Warehousing and Mining (IJDWM) 5(3), 1–27 (2009)

    Article  Google Scholar 

  33. Vassiliadis, P., Simitsis, A.: Near real time ETL. In: New Trends in Data Warehousing and Data Analysis, Annals of Information Systems, vol. 3, pp. 1–31. Springer US (2009)

    Google Scholar 

  34. Waas, F., Wrembel, R., Freudenreich, T., Thiele, M., Koncilia, C., Furtado, P.: On-demand ELT architecture for right-time BI: Extending the vision. International Journal of Data Warehousing and Mining (IJDWM) 9(2), 21–38 (2013)

    Article  Google Scholar 

  35. Wrembel, R.: On handling the evolution of external data sources in a data warehouse architecture. In: Integrations of Data Warehousing, Data Mining and Database Technologies, pp. 106–147 (2011)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Marcin Gorawski .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Gorawski, M., Gorawska, A. (2014). Research on the Stream ETL Process. In: Kozielski, S., Mrozek, D., Kasprowski, P., Małysiak-Mrozek, B., Kostrzewa, D. (eds) Beyond Databases, Architectures, and Structures. BDAS 2014. Communications in Computer and Information Science, vol 424. Springer, Cham. https://doi.org/10.1007/978-3-319-06932-6_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-06932-6_7

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-06931-9

  • Online ISBN: 978-3-319-06932-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics