Abstract
ETL (Extract Transform Load) is the widely used standard process for creating and maintaining a Data Warehouse (DW). ETL is the most resource, cost and time demanding process in DW implementation and maintenance . Now a days, many Graphical User Interfaces (GUI) based solutions are available to facilitate the ETL processes. In spite of the high popularity of GUI based tool, there is still some downside of such approach. This paper focuses on alternative ETL developmental approach taken by hand coding. In some context, it is appropriate to custom develop an ETL code which can be cheaper, faster and maintainable. Some well-known code based open source ETL tool (Pygrametl, Petl, Scriptella, R_etl) developed by the academic world has been studied in this article. Their architecture and implementation details are addressed here. The aim of this paper is to present a comparative evaluation of these code based ETL tools. Not to acclaim that code based ETL is superior to GUI based approach. It depends on the particular requirement, data strategy and infrastructure of any organization to choose the path between Code based and GUI based approach.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
2017 Gartner Magic Quadrant for Data Integration Tools. https://www.informatica.com/in/data-integration-magic-quadrant.html. Accessed 06 Dec 2017
Bubbles — Data Brewery. http://bubbles.databrewery.org/. Accessed 06 Dec 2017
Data Pipeline. https://www.northconcepts.com/. Accessed 06 Dec 2017
Petl - Extract, Transform and Load (Tables of Data). http://petl.readthedocs.io/en/latest/. Accessed 06 Dec 2017
Scriptella/scriptella-etl. https://github.com/scriptella/scriptella-etl/wiki. Accessed 06 Dec 2017
Welcome to Scriptella ETL Project. http://scriptella.org/. Accessed 06 Dec 2017
Data Integration. http://www.pentaho.com/product/data-integration. Accessed 06 Feb 2018
Data Integration: Talend Enterprise Data Integration Services. http://www.talend.com/products/data-integration. Accessed 06 Feb 2018
Data Integration Tools and Software Solutions — Informatica India. https://www.informatica.com/in/products/data-integration.html. Accessed 06 Feb 2018
IBM, InfoSphere Information Server. http://www-03.ibm.com/software/products/en/infosphere-information-server/. Accessed 06 Feb 2018
Oracle Data Integrator. http://www.oracle.com/technetwork/middleware/data-integrator/overview/index.html. Accessed 06 Feb 2018
Pygrametl, ETL programming in Python. http://www.pygrametl.org/. Accessed 25 Feb 2018
Stiivi/bubbles. https://github.com/stiivi/bubbles. Accessed 25 Feb 2018
ETL. https://cran.r-project.org/web/packages/etl/README.html. Accessed 10 Mar 2018
Baumer, B.: etl: Extract-Transform-Load Framework for Medium Data (2017). R package version 0.3.7. http://github.com/beanumber/etl
Baumer, B.: A grammar for reproducible and painless extract-transform-load operations on medium data. arXiv preprint arXiv:1708.07073 (2017)
Eckerson, W., White, C.: Evaluating ETL and data integration platforms. Report of the Data Warehousing Institute 184 (2003)
Inmon, W.: Building the Data Warehouse. Wiley, Hoboken (2005)
Kabiri, A., Chiadmi, D.: Survey on ETL processes. J. Theor. Appl. Inf. Technol. 54(2), 219–229 (2013)
Liu, X., Thomsen, C., Pedersen, T.: Mapreduce-based dimensional ETL made easy. Proc. VLDB Endow. 5(12), 1882–1885 (2012)
Liu, X., Thomsen, C., Pedersen, T.B.: ETLMR: a highly scalable dimensional ETL framework based on MapReduce. In: Cuzzocrea, A., Dayal, U. (eds.) DaWaK 2011. LNCS, vol. 6862, pp. 96–111. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-23544-3_8
Majchrzak, T.A., Jansen, T., Kuchen, H.: Efficiency evaluation of open source ETL tools. In: Proceedings of the 2011 ACM Symposium on Applied Computing, pp. 287–294. ACM (2011)
Nath, R., Hose, K., Pedersen, T.: Towards a programmable semantic extract-transform-load framework for semantic data warehouses. In: Proceedings of the ACM Eighteenth International Workshop on Data Warehousing and OLAP, pp. 15–24. ACM (2015)
Nath, R., Hose, K., Pedersen, T., Romero, O.: SETL: a programmable semantic extract-transform-load framework for semantic data warehouses. Inf. Syst. 68, 17–43 (2017)
Pall, A.S., Khaira, J.S.: A comparative review of extraction, transformation and loading tools. Database Syst. J. BOARD 4(2), 42–51 (2013)
Radonić, M., Mekterović, I.: ETLator-a scripting ETL framework. In: 2017 40th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO), pp. 1349–1354. IEEE (2017)
Schmidt, N., Rosa, M., Garcia, R., Molina, E., Reyna, R., Gonzalez, J.: ETL tool evaluation-a criteria framework, pp. 1–12 (2011)
Thomsen, C., Pedersen, T.: Pygrametl: a powerful programming framework for extract-transform-load programmers. In: Proceedings of the ACM Twelfth International Workshop on Data Warehousing and OLAP, pp. 49–56. ACM (2009)
Thomsen, C., Pedersen, T.: Easy and effective parallel programmable ETL. In: Proceedings of the ACM 14th International Workshop on Data Warehousing and OLAP, pp. 37–44. ACM (2011)
Thomsen, C., Pedersen, T.B.: A survey of open source tools for business intelligence. In: Tjoa, A.M., Trujillo, J. (eds.) DaWaK 2005. LNCS, vol. 3589, pp. 74–84. Springer, Heidelberg (2005). https://doi.org/10.1007/11546849_8
Vassiliadis, P.: A survey of extract - transform - load technology. Int. J. Data Warehouse. Min. 5(3), 1–27 (2009)
Vassiliadis, P., Simitsis, A., Baikousi, E.: A taxonomy of ETL activities. In: Proceedings of the ACM Twelfth International Workshop on Data Warehousing and OLAP, pp. 25–32. ACM (2009)
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Biswas, N., Sarkar, A., Mondal, K.C. (2019). Empirical Analysis of Programmable ETL Tools. In: Mandal, J., Mukhopadhyay, S., Dutta, P., Dasgupta, K. (eds) Computational Intelligence, Communications, and Business Analytics. CICBA 2018. Communications in Computer and Information Science, vol 1031. Springer, Singapore. https://doi.org/10.1007/978-981-13-8581-0_22
Download citation
DOI: https://doi.org/10.1007/978-981-13-8581-0_22
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-8580-3
Online ISBN: 978-981-13-8581-0
eBook Packages: Computer ScienceComputer Science (R0)