Advertisement

ETL Workflows: From Formal Specification to Optimization

  • Timos K. Sellis
  • Alkis Simitsis
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4690)

Abstract

In this paper, we present our work on a framework towards the modeling and optimization of Extraction-Transformation-Loading (ETL) workflows. The goal of this research was to facilitate, manage, and optimize the design and implementation of the ETL workflows both during the initial design and deployment stage, as well as, during the continuous evolution of a data warehouse. In particular, we present our results which include: (a) the provision of a novel conceptual model for the tracing of inter-attribute relationships and the respective ETL transformations in the early stages of a data warehouse project, along with an attempt to use ontology-based mechanisms to semi-automatically capture the semantics and the relationships among the various sources; (b) the provision of a novel logical model for the representation of ETL workflows with two main characteristics: genericity and customization; (c) the semi-automatic transition from the conceptual to the logical model for ETL workflows; and (d) the tuning of an ETL workflow for the optimization of the execution order of its operations. Finally, we discuss some issues on future work in the area that we consider important and a step towards the incorporation of the above research results to other areas as well.

Keywords

Logical Model Data Warehouse Operational Semantic Execution Order Template Activity 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Polyzotis, N., Skiadopoulos, S., Vassiliadis, P., Simitsis, A., Frantzell, N.-E.: Supporting streaming updates in an active data warehouse. In: ICDE 2007. Proceedings of the 23rd IEEE International Conference on Data Engineering, IEEE Computer Society Press, Los Alamitos (2007)Google Scholar
  2. 2.
    Rahm, E., Bernstein, P.A.: A survey of approaches to automatic schema matching. VLDB Journal 10(4), 334–350 (2001)zbMATHCrossRefGoogle Scholar
  3. 3.
    Simitsis, A.: Mapping conceptual to logical models for ETL processes. In: DOLAP 2005. Proceedings of the ACM 8th International Workshop on Data Warehousing and OLAP, pp. 67–76. ACM Press, New York (2005)CrossRefGoogle Scholar
  4. 4.
    Simitsis, A., Vassiliadis, P.: A method for the mapping of conceptual designs to logical blueprints for ETL processes. Decision Support Systems (DSS) (to appear)Google Scholar
  5. 5.
    Simitsis, A., Vassiliadis, P., Sellis, T.K.: Optimizing ETL processes in data warehouses. In: Proceedings of the 21st International Conference on Data Engineering (ICDE 2005), pp. 564–575 (2005)Google Scholar
  6. 6.
    Simitsis, A., Vassiliadis, P., Sellis, T.K.: State-space optimization of ETL workflows. IEEE Transactions on Knowledge and Data Engineering 17(10), 1404–1419 (2005)CrossRefGoogle Scholar
  7. 7.
    Simitsis, A., Vassiliadis, P., Skiadopoulos, S., Sellis, T.K.: Data Warehouses and OLAP: Concepts, Architectures and Solutions. In: Wrembel, R., Koncilia, C. (eds.) Data Warehouse Refreshment, IRM Press (2006)Google Scholar
  8. 8.
    Simitsis, A., Vassiliadis, P., Terrovitis, M., Skiadopoulos, S.: Graph-based modeling of ETL activities with multi-level transformations and updates. In: Tjoa, A.M., Trujillo, J. (eds.) DaWaK 2005. LNCS, vol. 3589, pp. 43–52. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  9. 9.
    Skoutas, D., Simitsis, A.: Designing ETL processes using semantic web technologies. In: DOLAP 2006. Proceedings of the ACM 9th International Workshop on Data Warehousing and OLAP, pp. 67–74. ACM Press, New York (2006)CrossRefGoogle Scholar
  10. 10.
    Skoutas, D., Simitsis, A.: Flexible and customizable NL representation of requirements for ETL processes. In: Proceedings of the 12th Int’l Conf. on Applications of Natural Language to Information Systems (NLDB 2007), pp. 433–439 (2007)Google Scholar
  11. 11.
    Skoutas, D., Simitsis, A.: Ontology-based conceptual design of ETL processes for both structured and semi-structured data. Int’l Journal of Semantic Web and Information Systems (to appear)Google Scholar
  12. 12.
    Vassiliadis, P., Simitsis, A., Georgantas, P., Terrovitis, M., Skiadopoulos, S.: A generic and customizable framework for the design of ETL scenarios. Infornation Systems 30(7), 492–525 (2005)CrossRefGoogle Scholar
  13. 13.
    Vassiliadis, P., Simitsis, A., Skiadopoulos, S.: Conceptual modeling for ETL processes. In: DOLAP 2002. Proceedings of the ACM 5th International Workshop on Data Warehousing and OLAP, pp. 14–21. ACM Press, New York (2002)CrossRefGoogle Scholar
  14. 14.
    Vassiliadis, P., Simitsis, A., Terrovitis, M., Skiadopoulos, S.: Blueprints and measures for ETL workflows. In: Delcambre, L.M.L., Kop, C., Mayr, H.C., Mylopoulos, J., Pastor, Ó. (eds.) ER 2005. LNCS, vol. 3716, pp. 385–400. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  15. 15.
    Zaniolo, C.: LDL++ Tutorial. UCLA (1998), available at: http://pike.cs.ucla.edu/ldl/

Copyright information

© Springer-Verlag Berlin Heidelberg 2007

Authors and Affiliations

  • Timos K. Sellis
    • 1
  • Alkis Simitsis
    • 2
  1. 1.School of Electrical and Computer Engineering, National Technical University of Athens, Athens, Hellas 
  2. 2.IBM Almaden Research Center, San Jose CA 95120USA

Personalised recommendations