Advertisement

Benchmarking ETL Workflows

  • Alkis Simitsis
  • Panos Vassiliadis
  • Umeshwar Dayal
  • Anastasios Karagiannis
  • Vasiliki Tziovara
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5895)

Abstract

Extraction–Transform–Load (ETL) processes comprise complex data workflows, which are responsible for the maintenance of a Data Warehouse. A plethora of ETL tools is currently available constituting a multi-million dollar market. Each ETL tool uses its own technique for the design and implementation of an ETL workflow, making the task of assessing ETL tools extremely difficult. In this paper, we identify common characteristics of ETL workflows in an effort of proposing a unified evaluation method for ETL. We also identify the main points of interest in designing, implementing, and maintaining ETL workflows. Finally, we propose a principled organization of test suites based on the TPC-H schema for the problem of experimenting with ETL workflows.

Keywords

Data Warehouses ETL benchmark 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Ab Initio (2009), http://www.abinitio.com/
  2. 2.
    Adzic, J., Fiore, V.: Data Warehouse Population Platform. In: DMDW (2003)Google Scholar
  3. 3.
    Briand, L.C., Morasca, S., Basili, V.R.: Property-Based Software Engineering Measurement. IEEE Trans. on Software Engineering 22(1) (1996)Google Scholar
  4. 4.
    Carey, M.J., DeWitt, D.J., Naughton, J.F.: The OO7 Benchmark. In: SIGMOD (1993)Google Scholar
  5. 5.
    Carey, M.J., et al.: The BUCKY Object-Relational Benchmark. In: SIGMOD (1997)Google Scholar
  6. 6.
    Dayal, U., Castellanos, M., Simitsis, A., Wilkinson, K.: Data Integration Flows for Business Intelligence. In: EDBT (2009)Google Scholar
  7. 7.
    IBM, IBM InfoSphere Information Server (2009), http://www-01.ibm.com/software/data/integration/info_server_platform/
  8. 8.
    Informatica, PowerCenter (2009), http://www.informatica.com/products/powercenter/
  9. 9.
    Microsoft. SQL Server Integration Services (SSIS) (2009), http://technet.microsoft.com/en-us/sqlserver/bb331782.aspx
  10. 10.
    Oracle, Oracle Warehouse Builder 11g (2009), http://www.oracle.com/technology/products/warehouse/
  11. 11.
    Othayoth, R., Poess, M.: The Making of TPC-DS. In: VLDB (2006)Google Scholar
  12. 12.
    Simitsis, A., Vassiliadis, P., Skiadopoulos, S., Sellis, T.: Data Warehouse Refreshment. In: Data Warehouses and OLAP: Concepts, Architectures and Solutions. IRM Press (2006)Google Scholar
  13. 13.
    Simitsis, A., Wilkinson, K., Castellanos, M., Dayal, U.: QoX-Driven ETL Design: Reducing the Cost of the ETL Consulting Engagements. In: SIGMOD (2009)Google Scholar
  14. 14.
    TPC. TPC Benchmark Status. TPC-ETL (2009), http://www.tpc.org/reports/status/
  15. 15.
    TPC. TPC-H benchmark. Transaction Processing Council (2009), http://www.tpc.org/
  16. 16.
    Vassiliadis, P., Karagiannis, A., Tziovara, V., Simitsis, A.: Towards a Benchmark for ETL Workflows. In: QDB (2007), http://www.cs.uoi.gr/~pvassil/publications/publications.html

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Alkis Simitsis
    • 1
  • Panos Vassiliadis
    • 2
  • Umeshwar Dayal
    • 1
  • Anastasios Karagiannis
    • 2
  • Vasiliki Tziovara
    • 2
  1. 1.HP LabsPalo AltoUSA
  2. 2.Dept. of Computer ScienceUniversity of IoanninaIoannina

Personalised recommendations