Programmatic ETL

Conference paper
Part of the Lecture Notes in Business Information Processing book series (LNBIP, volume 324)


Extract-Transform-Load (ETL) processes are used for extracting data, transforming it and loading it into data warehouses (DWs). The dominating ETL tools use graphical user interfaces (GUIs) such that the developer “draws” the ETL flow by connecting steps/transformations with lines. This gives an easy overview, but can also be rather tedious and require much trivial work for simple things. We therefore challenge this approach and propose to do ETL programming by writing code. To make the programming easy, we present the Python-based framework pygrametl which offers commonly used functionality for ETL development. By using the framework, the developer can efficiently create effective ETL solutions from which the full power of programming can be exploited. In this chapter, we present our work on pygrametl and related activities. Further, we consider some of the lessons learned during the development of pygrametl as an open source framework.


  1. 1.
    Beyer, M.A., Thoo, E., Selvage, M.Y., Zaidi, E.: Gartner Magic Quadrant for Data Integration Tools (2017)Google Scholar
  2. 2.
    Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. In: Proceedings of the OSDI, pp. 137–150 (2004).
  3. 3.
    Django. Accessed 13 Oct 2017
  4. 4.
    Grönniger, H., Krahn, H., Rumpe, B., Schindler, M., Völkel, S.: Text-based modeling. In: Proceedings of ATEM (2007)Google Scholar
  5. 5.
    IBM InfoSphere DataStage. Accessed 13 Oct 2017
  6. 6.
    Informatica. Accessed 13 Oct 2017
  7. 7.
    Jensen, C.S., Pedersen, T.B., Thomsen, C.: Multidimensional Databases and Data Warehousing. Morgan and Claypool, San Rafael (2010). Scholar
  8. 8.
    Kimball, R., Ross, M.: The Data Warehouse Toolkit, 2nd edn. Wiley, New York (2002)Google Scholar
  9. 9.
    Liu, X., Thomsen, C., Pedersen, T.B.: ETLMR: a highly scalable dimensional ETL framework based on MapReduce. In: Cuzzocrea, A., Dayal, U. (eds.) DaWaK 2011. LNCS, vol. 6862, pp. 96–111. Springer, Heidelberg (2011). Scholar
  10. 10.
    Microsoft SQL Server Integration Services. Accessed 13 Oct 2017
  11. 11.
  12. 12.
    Pentaho Data Integration - Kettle. Accessed 13 Oct 2017
  13. 13.
    Petre, M.: Why looking isn’t always seeing: readership skills and graphical programming. Commun. ACM 38(6), 33–44 (1995). Scholar
  14. 14.
    PostgreSQL. Accessed 13 Oct 2017Google Scholar
  15. 15.
    Psycopg. Accessed 13 Oct 2017
  16. 16.
    Python. Accessed 13 Oct 2017Google Scholar
  17. 17.
    Ruby on Rails. Accessed 13 Oct 2017Google Scholar
  18. 18.
    SAP Data Services. Accessed 13 Oct 2017
  19. 19.
    Scriptella. Accessed 13 Oct 2017Google Scholar
  20. 20.
    Simitsis, A., Vassiliadis, P., Terrovitis, M., Skiadopoulos, S.: Graph-based modeling of ETL activities with multi-level transformations and updates. In: Tjoa, A.M., Trujillo, J. (eds.) DaWaK 2005. LNCS, vol. 3589, pp. 43–52. Springer, Heidelberg (2005). Scholar
  21. 21.
    Thomsen, C., Pedersen, T.B.: Building a web warehouse for accessibility data. In: Proceedings of DOLAP (2006).
  22. 22.
    Thomsen, C., Pedersen, T.B.: A survey of open source tools for business intelligence. IJDWM 5(3), 56–75 (2009). Scholar
  23. 23.
    Thomsen, C., Pedersen, T.B.: pygrametl: a powerful programming framework for extract-transform-load programmers. In: Proceedings of DOLAP, pp. 49–56 (2009).
  24. 24.
    Thomsen, C., Pedersen, T.B.: pygrametl: a powerful programming framework for extract-transform-load programmers. DBTR-25, Aalborg University (2009).
  25. 25.
    Thomsen, C., Pedersen, T.B.: Easy and effective parallel programmable ETL. In: Proceedings of DOLAP, pp. 37–44 (2011)Google Scholar
  26. 26.
    Trujillo, J., Luján-Mora, S.: A UML based approach for modeling ETL processes in data warehouses. In: Song, I.-Y., Liddle, S.W., Ling, T.-W., Scheuermann, P. (eds.) ER 2003. LNCS, vol. 2813, pp. 307–320. Springer, Heidelberg (2003). Scholar
  27. 27.
    Vaisman, A., Zimanyi, E.: Data Warehouse Systems: Design and Implementation. Springer, Heidelberg (2014). Scholar
  28. 28.
    Vassiliadis, P.: A survey of extract-transform-load technology. IJDWM 5(3), 1–27 (2009). Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Department of Computer ScienceAalborg UniversityAalborgDenmark

Personalised recommendations