ETL Workflows: From Formal Specification to Optimization
In this paper, we present our work on a framework towards the modeling and optimization of Extraction-Transformation-Loading (ETL) workflows. The goal of this research was to facilitate, manage, and optimize the design and implementation of the ETL workflows both during the initial design and deployment stage, as well as, during the continuous evolution of a data warehouse. In particular, we present our results which include: (a) the provision of a novel conceptual model for the tracing of inter-attribute relationships and the respective ETL transformations in the early stages of a data warehouse project, along with an attempt to use ontology-based mechanisms to semi-automatically capture the semantics and the relationships among the various sources; (b) the provision of a novel logical model for the representation of ETL workflows with two main characteristics: genericity and customization; (c) the semi-automatic transition from the conceptual to the logical model for ETL workflows; and (d) the tuning of an ETL workflow for the optimization of the execution order of its operations. Finally, we discuss some issues on future work in the area that we consider important and a step towards the incorporation of the above research results to other areas as well.
KeywordsLogical Model Data Warehouse Operational Semantic Execution Order Template Activity
Unable to display preview. Download preview PDF.
- 1.Polyzotis, N., Skiadopoulos, S., Vassiliadis, P., Simitsis, A., Frantzell, N.-E.: Supporting streaming updates in an active data warehouse. In: ICDE 2007. Proceedings of the 23rd IEEE International Conference on Data Engineering, IEEE Computer Society Press, Los Alamitos (2007)Google Scholar
- 4.Simitsis, A., Vassiliadis, P.: A method for the mapping of conceptual designs to logical blueprints for ETL processes. Decision Support Systems (DSS) (to appear)Google Scholar
- 5.Simitsis, A., Vassiliadis, P., Sellis, T.K.: Optimizing ETL processes in data warehouses. In: Proceedings of the 21st International Conference on Data Engineering (ICDE 2005), pp. 564–575 (2005)Google Scholar
- 7.Simitsis, A., Vassiliadis, P., Skiadopoulos, S., Sellis, T.K.: Data Warehouses and OLAP: Concepts, Architectures and Solutions. In: Wrembel, R., Koncilia, C. (eds.) Data Warehouse Refreshment, IRM Press (2006)Google Scholar
- 10.Skoutas, D., Simitsis, A.: Flexible and customizable NL representation of requirements for ETL processes. In: Proceedings of the 12th Int’l Conf. on Applications of Natural Language to Information Systems (NLDB 2007), pp. 433–439 (2007)Google Scholar
- 11.Skoutas, D., Simitsis, A.: Ontology-based conceptual design of ETL processes for both structured and semi-structured data. Int’l Journal of Semantic Web and Information Systems (to appear)Google Scholar
- 15.Zaniolo, C.: LDL++ Tutorial. UCLA (1998), available at: http://pike.cs.ucla.edu/ldl/