Skip to main content
Log in

Automating ETL processes using the domain-specific modeling approach

  • Original Article
  • Published:
Information Systems and e-Business Management Aims and scope Submit manuscript

Abstract

The development of Extract–Transform–Load (ETL) processes is the most complex, time-consuming and expensive phase of data warehouse development. Yet, the dynamics of modern business systems demand a more agile and flexible approach to their development. As a result, current research in this area is focused on ETL process conceptualization and the automation of ETL process development. This paper proposes a novel solution for automating ETL processes using the domain-specific modeling (DSM) approach. The proposed solution is based on the formal specification of ETL processes and the implementation of such formal specifications. Thus, in accordance with the DSM approach, several new domain-specific languages (DSLs) are introduced, each defining concepts relevant for a specific aspect of an ETL process. The focus of this paper is the actual implementation of the formal specification of an ETL process. To this end, a specific ETL platform (ETL-PL) is introduced to technologically support both the modeling of ETL processes (i.e., the creation of models in accordance with the introduced DSLs) and the automated transformation of the created models into the executable code of a specific application framework (representing ETL-PL’s execution environment). It should be emphasized that ETL-PL actually presumes the dynamic execution of ETL models or, more precisely, the executable code is generated at runtime. Thus the execution environment consists of code generator components and the components implementing the application framework. ETL-PL has been implemented as an extension of the .NET platform.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21

Similar content being viewed by others

References

  • El Akkaoui Z, Zimányi E (2009) Defining ETL worfklows using BPMN and BPEL. In: Proceedings of DOLAP ‘09, (China), pp 41–48

  • El Akkaoui, Zimányi E, Mazón J-N, Trujillo J (2011) A model-driven framework for ETL process development. In: Proceedings of DOLAP ‘11, (UK), pp 45–52

  • El Akkaoui Z, Mazón J-N, Vaisman A, Zimányi E (2012) BPMN-based conceptual modeling of ETL processes. In: Data warehousing and knowledge discovery, LNCS 7448. Springer, Berlin, pp 1–14

  • Fowler M (2010) Domain-specific languages. Addison-Wesley Professional, Boston

    Google Scholar 

  • Greenfield J, Short K, Cook S, Kent S (2004) Software factories: assembling applications with patterns, models, frameworks, and tools. Wiley, Hoboken

    Google Scholar 

  • Hazzard K, Bock J (2013) Metaprogramming in.NET. Manning Publications, Greenwich

    Google Scholar 

  • Ivantsov R (2009) Irony—.NET language implementation kit. [Online] CodePlexProject Hosting for Open Source Software: http://irony.codeplex.com/

  • Jarke M, Lenzerini M, Vassiliou Y, Vassiliadis P (2003) Fundamentals of data warehouses. Springer, Berlin

    Book  Google Scholar 

  • Kelly S, Tolvanen JP (2008) Domain-specific modeling: enabling full code generation. Wiley, Hoboken

    Book  Google Scholar 

  • Kimball R, Caserta J (2004) The data warehouse ETL toolkit: practical techniques for extracting, cleaning, conforming, and delivering data. Wiley, Hoboken

    Google Scholar 

  • Kimball R, Ross M, Thornthwaite W, Mundy J, Becker B (2010) The Kimball group reader: relentlessly practical tools for data warehousing and business intelligence. Wiley, Hoboken

    Google Scholar 

  • Luján-Mora S, Trujillo J (2004) A data warehouse engineering process. In: Advances in information systems, LNCS 3261. Springer, Berlin, pp 14–23

  • Luján-Mora S, Vassiliadis P, Trujillo J (2004) Data mapping diagrams for data warehouse design with UML. In: Conceptual modeling-ER 2004, LNCS 3288. Springer, Berlin, pp 191–204

  • Mazón J-N, Trujillo J (2008) An MDA approach for the development of data warehouses. Decis Support Syst 45(1):41–58

    Article  Google Scholar 

  • Microsoft (2013) Modeling SDK for Microsoft Visual Studio 2013. [Online] http://www.microsoft.com/en-us/download/details.aspx?id=40754

  • Microsoft (2014a) Emitting dynamic methods and assemblies. [Online] https://msdn.microsoft.com/en-us/library/8ffc3x75%28v=vs.110%29.aspx

  • Microsoft (2014b) Expression trees (C# and Visual Basic). [Online] https://msdn.microsoft.com/en-us/library/bb397951.aspx

  • Muñoz L, Mazón JN, Pardillo J, Trujillo J (2008) Modelling ETL processes of data warehouses with UML activity diagrams. In: On the move to meaningful internet systems: OTM 2008 workshops, LNCS 5333. Springer, Berlin, pp 44–53

  • Muñoz L, Mazón JN, Trujillo J (2009) Automatic generation of ETL processes from conceptual models. In: Proceedings of DOLAP ‘09, (China), pp 33–40

  • Petrović M (2014) A model driven development approach for the data warehouse extract, transform and load process. Ph.D. Thesis final version (in Serbian), Faculty of Organizational Sciences, University of Belgrade, Serbia

  • Simitsis A (2005) Mapping conceptual to logical models for ETL processes. In: Proceedings of DOLAP ‘05, (Germany), pp 67–76

  • Simitsis A, Vassiliadis P (2003) A methodology for the conceptual modeling of ETL processes. In: Proceedings of the decision systems engineering—DSE ‘03, (Austria), pp 305–316

  • Simitsis A, Vassiliadis P (2008) A method for the mapping of conceptual designs to logical blueprints for ETL processes. Decis Support Syst 45(1):22–40

    Article  Google Scholar 

  • Simitsis A, Vassiliadis P, Terrovitis M, Skiadopoulos S (2005) Graph-based modeling of ETL activities with multi-level transformations and updates. In: Data warehousing and knowledge discovery, LNCS 3589. Springer, Berlin, pp 43–52

  • Troelsen A (2012) Pro C# 5.0 and the.NET 4.5 Framework. Apress

  • Trujillo J, Luján-Mora S (2003) A UML based approach for modeling ETL Processes in data warehouses. In: Conceptual modeling-ER 2003, LNCS 2813. Springer, Berlin, pp 307–320

  • Turajlić N, Petrović M, Vučković M (2014) Analysis of ETL process development approaches: some open issues. In: Proceedings of SYMORG’14, pp 45–51

  • Vassiliadis P, Simitsis A, Skiadopoulos S (2002) Modeling ETL activities as graphs. In: Proceedings of DMDW’02, pp 52–61

  • Vassiliadis P, Simitsis A, Skiadopoulos S (2002) Conceptual modeling for ETL processes. In: Proceedings of DOLAP ‘02, (USA), pp 14–21

  • Vassiliadis P, Simitsis A, Georgantas P, Terrovitis M (2003) A framework for the design of ETL scenarios. In: Advanced information systems engineering, LNCS 2681. Springer, Berlin, pp 520–535

  • Vassiliadis P, Simitsis A, Georgantas P, Terrovitis M, Skiadopoulos S (2005) A generic and customizable framework for the design of ETL scenarios. Inf Syst 30(7):492–525

    Article  Google Scholar 

  • Vassiliadis P, Simitsis A, Baikousi E (2009) A taxonomy of ETL activities. In: Proceedings of DOLAP’09, (China), pp 25–32

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Marko Petrović.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Petrović, M., Vučković, M., Turajlić, N. et al. Automating ETL processes using the domain-specific modeling approach. Inf Syst E-Bus Manage 15, 425–460 (2017). https://doi.org/10.1007/s10257-016-0325-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10257-016-0325-8

Keywords

Navigation