Skip to main content

Empirical Analysis of Programmable ETL Tools

  • Conference paper
  • First Online:
Computational Intelligence, Communications, and Business Analytics (CICBA 2018)

Abstract

ETL (Extract Transform Load) is the widely used standard process for creating and maintaining a Data Warehouse (DW). ETL is the most resource, cost and time demanding process in DW implementation and maintenance . Now a days, many Graphical User Interfaces (GUI) based solutions are available to facilitate the ETL processes. In spite of the high popularity of GUI based tool, there is still some downside of such approach. This paper focuses on alternative ETL developmental approach taken by hand coding. In some context, it is appropriate to custom develop an ETL code which can be cheaper, faster and maintainable. Some well-known code based open source ETL tool (Pygrametl, Petl, Scriptella, R_etl) developed by the academic world has been studied in this article. Their architecture and implementation details are addressed here. The aim of this paper is to present a comparative evaluation of these code based ETL tools. Not to acclaim that code based ETL is superior to GUI based approach. It depends on the particular requirement, data strategy and infrastructure of any organization to choose the path between Code based and GUI based approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. 2017 Gartner Magic Quadrant for Data Integration Tools. https://www.informatica.com/in/data-integration-magic-quadrant.html. Accessed 06 Dec 2017

  2. Bubbles — Data Brewery. http://bubbles.databrewery.org/. Accessed 06 Dec 2017

  3. Data Pipeline. https://www.northconcepts.com/. Accessed 06 Dec 2017

  4. Petl - Extract, Transform and Load (Tables of Data). http://petl.readthedocs.io/en/latest/. Accessed 06 Dec 2017

  5. Scriptella/scriptella-etl. https://github.com/scriptella/scriptella-etl/wiki. Accessed 06 Dec 2017

  6. Welcome to Scriptella ETL Project. http://scriptella.org/. Accessed 06 Dec 2017

  7. Data Integration. http://www.pentaho.com/product/data-integration. Accessed 06 Feb 2018

  8. Data Integration: Talend Enterprise Data Integration Services. http://www.talend.com/products/data-integration. Accessed 06 Feb 2018

  9. Data Integration Tools and Software Solutions — Informatica India. https://www.informatica.com/in/products/data-integration.html. Accessed 06 Feb 2018

  10. IBM, InfoSphere Information Server. http://www-03.ibm.com/software/products/en/infosphere-information-server/. Accessed 06 Feb 2018

  11. Oracle Data Integrator. http://www.oracle.com/technetwork/middleware/data-integrator/overview/index.html. Accessed 06 Feb 2018

  12. Pygrametl, ETL programming in Python. http://www.pygrametl.org/. Accessed 25 Feb 2018

  13. Stiivi/bubbles. https://github.com/stiivi/bubbles. Accessed 25 Feb 2018

  14. ETL. https://cran.r-project.org/web/packages/etl/README.html. Accessed 10 Mar 2018

  15. Baumer, B.: etl: Extract-Transform-Load Framework for Medium Data (2017). R package version 0.3.7. http://github.com/beanumber/etl

  16. Baumer, B.: A grammar for reproducible and painless extract-transform-load operations on medium data. arXiv preprint arXiv:1708.07073 (2017)

  17. Eckerson, W., White, C.: Evaluating ETL and data integration platforms. Report of the Data Warehousing Institute 184 (2003)

    Google Scholar 

  18. Inmon, W.: Building the Data Warehouse. Wiley, Hoboken (2005)

    Google Scholar 

  19. Kabiri, A., Chiadmi, D.: Survey on ETL processes. J. Theor. Appl. Inf. Technol. 54(2), 219–229 (2013)

    Google Scholar 

  20. Liu, X., Thomsen, C., Pedersen, T.: Mapreduce-based dimensional ETL made easy. Proc. VLDB Endow. 5(12), 1882–1885 (2012)

    Article  Google Scholar 

  21. Liu, X., Thomsen, C., Pedersen, T.B.: ETLMR: a highly scalable dimensional ETL framework based on MapReduce. In: Cuzzocrea, A., Dayal, U. (eds.) DaWaK 2011. LNCS, vol. 6862, pp. 96–111. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-23544-3_8

    Chapter  Google Scholar 

  22. Majchrzak, T.A., Jansen, T., Kuchen, H.: Efficiency evaluation of open source ETL tools. In: Proceedings of the 2011 ACM Symposium on Applied Computing, pp. 287–294. ACM (2011)

    Google Scholar 

  23. Nath, R., Hose, K., Pedersen, T.: Towards a programmable semantic extract-transform-load framework for semantic data warehouses. In: Proceedings of the ACM Eighteenth International Workshop on Data Warehousing and OLAP, pp. 15–24. ACM (2015)

    Google Scholar 

  24. Nath, R., Hose, K., Pedersen, T., Romero, O.: SETL: a programmable semantic extract-transform-load framework for semantic data warehouses. Inf. Syst. 68, 17–43 (2017)

    Article  Google Scholar 

  25. Pall, A.S., Khaira, J.S.: A comparative review of extraction, transformation and loading tools. Database Syst. J. BOARD 4(2), 42–51 (2013)

    Google Scholar 

  26. Radonić, M., Mekterović, I.: ETLator-a scripting ETL framework. In: 2017 40th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO), pp. 1349–1354. IEEE (2017)

    Google Scholar 

  27. Schmidt, N., Rosa, M., Garcia, R., Molina, E., Reyna, R., Gonzalez, J.: ETL tool evaluation-a criteria framework, pp. 1–12 (2011)

    Google Scholar 

  28. Thomsen, C., Pedersen, T.: Pygrametl: a powerful programming framework for extract-transform-load programmers. In: Proceedings of the ACM Twelfth International Workshop on Data Warehousing and OLAP, pp. 49–56. ACM (2009)

    Google Scholar 

  29. Thomsen, C., Pedersen, T.: Easy and effective parallel programmable ETL. In: Proceedings of the ACM 14th International Workshop on Data Warehousing and OLAP, pp. 37–44. ACM (2011)

    Google Scholar 

  30. Thomsen, C., Pedersen, T.B.: A survey of open source tools for business intelligence. In: Tjoa, A.M., Trujillo, J. (eds.) DaWaK 2005. LNCS, vol. 3589, pp. 74–84. Springer, Heidelberg (2005). https://doi.org/10.1007/11546849_8

    Chapter  Google Scholar 

  31. Vassiliadis, P.: A survey of extract - transform - load technology. Int. J. Data Warehouse. Min. 5(3), 1–27 (2009)

    Article  Google Scholar 

  32. Vassiliadis, P., Simitsis, A., Baikousi, E.: A taxonomy of ETL activities. In: Proceedings of the ACM Twelfth International Workshop on Data Warehousing and OLAP, pp. 25–32. ACM (2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Neepa Biswas or Kartick Chandra Mondal .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Biswas, N., Sarkar, A., Mondal, K.C. (2019). Empirical Analysis of Programmable ETL Tools. In: Mandal, J., Mukhopadhyay, S., Dutta, P., Dasgupta, K. (eds) Computational Intelligence, Communications, and Business Analytics. CICBA 2018. Communications in Computer and Information Science, vol 1031. Springer, Singapore. https://doi.org/10.1007/978-981-13-8581-0_22

Download citation

  • DOI: https://doi.org/10.1007/978-981-13-8581-0_22

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-13-8580-3

  • Online ISBN: 978-981-13-8581-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics