Abstract
Continuously growing importance of information assisted by rapid development of systems that collect and process huge volumes of data has become a great problem in terms of processing and analyzing data. The response to current and future needs of market is a data warehouse assisted by process of data extraction. Mentioned stream ETL process enables loading real-time data without interrupting processing or conducting analysis that supports decision-making processes. This paper presents first implementation of the stream ETL process which origins from model and concept of a Stream Data Warehouse. In the first part of this paper the concept of the Stream Data Warehouse and its major components, including stream ETL, will be presented. The second part contains description of a developed stream ETL engine, as well as results of performed accuracy and efficiency analysis. Finally, paper concludes with description of future research issues that will be addressed in further research on the presented solution.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Albrecht, A., Naumann, F.: Managing ETL processes. In: Proceedings of the International Workshop on New Trends in Information Integration, NTII 2008, Auckland, New Zealand, August 23, pp. 12–15 (2008)
Athanassoulis, M., Chen, S., Ailamaki, A., Gibbons, P.B., Stoica, R.: MaSM: efficient online updates in data warehouses. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2011, Athens, Greece, June 12-16, pp. 865–876. ACM (2011)
Bergamaschi, S., Guerra, F., Orsini, M., Sartori, C., Vincini, M.: A semantic approach to ETL technologies. Data and Knowledge Engineering 70(8), 717–731 (2011)
Berkani, N., Bellatreche, L., Khouri, S.: Towards a conceptualization of ETL and physical storage of semantic data warehouses as a service. Cluster Computing 16(4), 915–931 (2013)
Gorawski, M., Morzy, T., Wrembel, R., Zgrzywa, A.: Advanced data proceedings and analysis techniques. Control and Cybernetics 40, 581–583 (2012)
Gorawski, M.: Architecture of parallel spatial data warehouse: Balancing algorithm and resumption of data extraction. In: Software Engineering: Evolution and Emerging Technologies. Frontiers in Artificial Intelligence and Applications, vol. 130, pp. 49–59. IOS Press (2005)
Gorawski, M.: Advanced data warehouses. Habilitation. Studia Informatica 30(3B), 386 (2009)
Gorawski, M.: Multiversion spatio-temporal telemetric data warehouse. In: Grundspenkis, J., Kirikova, M., Manolopoulos, Y., Novickis, L. (eds.) ADBIS 2009. LNCS, vol. 5968, pp. 63–70. Springer, Heidelberg (2010)
Gorawski, M., Bańkowski, S., Gorawski, M.: Selection of structures with grid optimization, in multiagent data warehouse. In: Fyfe, C., Tino, P., Charles, D., Garcia-Osorio, C., Yin, H. (eds.) IDEAL 2010. LNCS, vol. 6283, pp. 292–299. Springer, Heidelberg (2010)
Gorawski, M., Chrószcz, A.: The design of stream database engine in concurrent environment. In: Meersman, R., Dillon, T., Herrero, P. (eds.) OTM 2009, Part II. LNCS, vol. 5871, pp. 1033–1049. Springer, Heidelberg (2009)
Gorawski, M., Chrószcz, A.: Query processing using negative and temporal tuples in stream query engines. In: Szmuc, T., Szpyrka, M., Zendulka, J. (eds.) CEE-SET 2009. LNCS, vol. 7054, pp. 70–83. Springer, Heidelberg (2012)
Gorawski, M., Chrószcz, A.: StreamAPAS: Query language and data model. In: 2009 International Conference on Complex, Intelligent and Software Intensive Systems, CISIS 2009, Fukuoka, Japan, March 16-19, pp. 75–82. IEEE Computer Society (2009)
Gorawski, M., Chrószcz, A.: Optimization of operator partitions in stream data warehouse. In: DOLAP 2011, Proceedings of the ACM 14th International Workshop on Data Warehousing and OLAP, Glasgow, United Kingdom, October 28, pp. 61–66. ACM (2011)
Gorawski, M., Chrószcz, A.: Synchronization modeling in stream processing. In: Morzy, T., Härder, T., Wrembel, R. (eds.) Advances in Databases and Information Systems. AISC, vol. 186, pp. 91–102. Springer, Heidelberg (2013)
Gorawski, M., Chrószcz, A., Gorawska, A.: Customer unification in E-commerce. In: Yin, H., Tang, K., Gao, Y., Klawonn, F., Lee, M., Weise, T., Li, B., Yao, X. (eds.) IDEAL 2013. LNCS, vol. 8206, pp. 142–152. Springer, Heidelberg (2013)
Gorawski, M., Gorawska, A.: AGKPStream a operatory strumieniowe. Studia Informatica 33(2A), 181–195 (2012)
Gorawski, M., Gorawska, A.: Stream join operators. In: 10th Students Science Conference Man-Civilization-Future. Oficyna Wydawnicza Politechniki Wroclawskiej (2012)
Gorawski, M., Gorawska, A., Pasterak, K.: Evaluation and development perspectives of stream data processing systems. In: Kwiecień, A., Gaj, P., Stera, P. (eds.) CN 2013. CCIS, vol. 370, pp. 300–311. Springer, Heidelberg (2013)
Gorawski, M., Lorek, M., Gorawska, A.: CUDA powered user-defined types and aggregates. In: 27th International Conference on Advanced Information Networking and Applications Workshops, WAINA 2013, Barcelona, Spain, March 25-28, pp. 1423–1428. IEEE Computer Society (2013)
Gorawski, M., Malczok, R.: On efficient storing and processing of long aggregate lists. In: Tjoa, A.M., Trujillo, J. (eds.) DaWaK 2005. LNCS, vol. 3589, pp. 190–199. Springer, Heidelberg (2005)
Gorawski, M., Malczok, R.: Indexing spatial objects in stream data warehouse. In: Nguyen, N.T., Katarzyniak, R., Chen, S.-M. (eds.) Advances in Intelligent Information and Database Systems. SCI, vol. 283, pp. 53–65. Springer, Heidelberg (2010)
Gorawski, M., Marks, P.: Influence of balancing used in a distributed data warehouse on the extraction process. In: Draheim, D., Weber, G. (eds.) TEAA 2005. LNCS, vol. 3888, pp. 84–98. Springer, Heidelberg (2006)
Gorawski, M., Marks, P.: Resumption of data extraction process in parallel data warehouses. In: Wyrzykowski, R., Dongarra, J., Meyer, N., Waśniewski, J. (eds.) PPAM 2005. LNCS, vol. 3911, pp. 478–485. Springer, Heidelberg (2006)
Gorawski, M., Marks, P.: Checkpoint-based resumption in data warehouses. In: Sacha, K. (ed.) Software Engineering Techniques; Design for Quality. IFIP, vol. 227, pp. 313–323. Springer, Boston (2006)
Gorawski, M., Marks, P.: Fault-tolerant distributed stream processing system. In: 17th International Workshop on Database and Expert Systems Applications (DEXA 2006), Krakow, Poland, September 4-8, pp. 395–399. IEEE Computer Society (2006)
Gorawski, M., Marks, P., Gorawski, M.: Collecting data streams from a distributed radio-based measurement system. In: Haritsa, J.R., Kotagiri, R., Pudi, V. (eds.) DASFAA 2008. LNCS, vol. 4947, pp. 702–705. Springer, Heidelberg (2008)
Gorawski, M., Pasterak, K.: Schedulery strumieniowe w AGKPStream. Studia Informatica 33(2A), 197–210 (2012)
Henschen, D.: 2013 analytics and info management trends. Information Week Report ID: R6061112 (2013)
Jörg, T., Deßloch, S.: Towards generating ETL processes for incremental loading. In: Proceedings of the 2008 International Symposium on Database Engineering & Applications. ACM International Conference Proceeding Series, vol. 299, pp. 101–110. ACM (2008)
Kakish, K., Kraft, T.A.: ETL evolution for real-time data warehousing. In: Proceedings of the Conference on Information Systems Applied Research (2012) ISSN 2167-1508
Mathioudakis, M., Koudas, N.: Twittermonitor: trend detection over the twitter stream. In: Proceedings of the 2010 International Conference on Management of Data, pp. 1155–1158. ACM (2010)
Vassiliadis, P.: A survey of extract-transform-load technology. International Journal of Data Warehousing and Mining (IJDWM) 5(3), 1–27 (2009)
Vassiliadis, P., Simitsis, A.: Near real time ETL. In: New Trends in Data Warehousing and Data Analysis, Annals of Information Systems, vol. 3, pp. 1–31. Springer US (2009)
Waas, F., Wrembel, R., Freudenreich, T., Thiele, M., Koncilia, C., Furtado, P.: On-demand ELT architecture for right-time BI: Extending the vision. International Journal of Data Warehousing and Mining (IJDWM) 9(2), 21–38 (2013)
Wrembel, R.: On handling the evolution of external data sources in a data warehouse architecture. In: Integrations of Data Warehousing, Data Mining and Database Technologies, pp. 106–147 (2011)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Gorawski, M., Gorawska, A. (2014). Research on the Stream ETL Process. In: Kozielski, S., Mrozek, D., Kasprowski, P., Małysiak-Mrozek, B., Kostrzewa, D. (eds) Beyond Databases, Architectures, and Structures. BDAS 2014. Communications in Computer and Information Science, vol 424. Springer, Cham. https://doi.org/10.1007/978-3-319-06932-6_7
Download citation
DOI: https://doi.org/10.1007/978-3-319-06932-6_7
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-06931-9
Online ISBN: 978-3-319-06932-6
eBook Packages: Computer ScienceComputer Science (R0)