Skip to main content

Analysis of Existing Concepts of Optimization of ETL-Processes

  • Conference paper
  • First Online:
Software Engineering Methods in Intelligent Algorithms (CSOC 2019)

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 984))

Included in the following conference series:

  • 606 Accesses

Abstract

Extract-Transform-Load (ETL) describes the process of loading data from a source to a destination. The source and the destination can be separated physically and transformations may take place in between. Data preparation happens regularly. To minimize interference with other business processes and to guarantee a high data availability these processes are often run during night times. Therefore the demand for shorter processing times of ETL-processes is increasing steadily. Besides data availability and actuality another reason is the transition to real- or near-time analysis of data and the growing data volume. There are several approaches for the optimization of ETL-processes which will be highlighted in detail in this article. A closer look will be taken on the advantages and disadvantages of the presented approaches. Concluding each approach will be set into competition and a recommendation depending on the use case is given.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Castellanos, M.G., et al.: Quality-driven ETL design optimization, U.S. Patent No 8 (2014)

    Google Scholar 

  2. Gantz, J., Reinsel, D.: The 2011 Digital Universe Study: Extracting Value from Chaos, IDC IView (2011)

    Google Scholar 

  3. Halasipuram, R., Deshpande, P.M., Padmanabhan, S.: Determining essential statistics for cost based optimization of an ETL workflow. In: EDBT, pp. 307–318 (2014)

    Google Scholar 

  4. Karagiannis, A., Vassiliadis, P., Simitsis, A.: Scheduling strategies for efficient ETL execution. Inf. Syst. 38(6), 927–945 (2013)

    Article  Google Scholar 

  5. Kumar, N., Kumar, P.S.: An efficient heuristic for logical optimization of ETL workflows. In: International Workshop on Business Intelligence for the Real-Time Enterprise, pp. 68–83. Springer, Heidelberg (2010)

    Google Scholar 

  6. Mehra, K.K., et al.: Extract, transform and load (ETL) system and method, U.S. Patent No. 9 (2017)

    Google Scholar 

  7. Liu, X., Iftikhar, N.: An ETL optimization framework using partitioning and parallelization. In: Proceedings of the 30th Annual ACM Symposium on Applied Computing, pp. 1015–1022 (2015)

    Google Scholar 

  8. Liu, X., Iftikhar, N.: Optimizing ETL dataflow using shared caching and parallelization methods, arXiv preprint arXiv:1409.1639 (2014)

  9. Mayo, C., et al.: Taming big data: implementation of a clinical use-case driven architecture. Int. J. Radiat. Oncol. Biol. Phys. 96, E417–E418 (2016)

    Article  Google Scholar 

  10. Orenga-Roglá, S., Chalmeta, R.: Social customer relationship management: taking advantage of Web 2.0 and Big Data technologies, SpringerPlus (2016)

    Google Scholar 

  11. Simitsis, A., et al.: Benchmarking ETL workflows. In: Technology Conference on Performance Evaluation and Benchmarking, pp. 199–220, Springer, Heidelberg (2009)

    Google Scholar 

  12. Simitsis, A., et al.: Optimizing ETL workflows for fault-tolerance. In: IEEE 26th International Conference on Data Engineering, pp. 385–396 (2010)

    Google Scholar 

  13. Simitsis, A., et al.: QoX-driven ETL design: reducing the cost of ETL consulting engagements. In: Proceedings of the 2009 ACM SIGMOD International Conference on Management of data, pp. 953–960 (2009)

    Google Scholar 

  14. Simitsis, A., Vassiliadis, P., Sellis, T.: Optimizing ETL processes in data warehouses. In: Data Engineering, pp. 564–575 (2005)

    Google Scholar 

  15. Simitsis, A., Vassiliadis, P., Sellis, T.: State-space optimization of ETL workflows. IEEE Trans. Knowl. Data Eng. 17(10), 1404–1419 (2005)

    Article  Google Scholar 

  16. Tziovara, V., Simitsis, A.: ETL workflows: from formal specification to optimization. In: East European Conference on Advances in Databases and Information Systems, pp. 1–11. Springer, Heidelberg (2007)

    Google Scholar 

  17. Tziovara, V., Vassiliadis, P., Simitsis, A.: Deciding the physical implementation of ETL workflows. In: Proceedings of the ACM tenth international workshop on Data warehousing and OLAP, pp. 49–56 (2007)

    Google Scholar 

  18. Wang, G., et al.: Big data analytics in logistics and supply chain management: Certain investigations for research and applications. Int. J. Prod. Econ. 176, 98–110 (2016)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sarah Myriam Lydia Hahn .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Hahn, S.M.L. (2019). Analysis of Existing Concepts of Optimization of ETL-Processes. In: Silhavy, R. (eds) Software Engineering Methods in Intelligent Algorithms. CSOC 2019. Advances in Intelligent Systems and Computing, vol 984. Springer, Cham. https://doi.org/10.1007/978-3-030-19807-7_7

Download citation

Publish with us

Policies and ethics