Skip to main content

Performance Comparison of Hadoop Based Tools with Commercial ETL Tools – A Case Study

  • Conference paper
Big Data Analytics (BDA 2013)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8302))

Included in the following conference series:

Abstract

Data analysis is one of the essential business needs of organizations to optimize performance. The data is loaded into data warehouse (DWH) using Extract, Transform and Load (ETL). Analytics is run on the DWH. The largest cost and execution time is associated with the ET part of this workflow. Recent approaches based on Hadoop, an open source Apache framework for data intensive scalable computing, provide an alternative for ET which is both cheaper and faster than commercial prevalent ETL tools. This paper presents a case study where experimental metric results have been presented in support of the claim. The reduction of cost makes it viable for small and large organizations alike and reduction in execution time makes it possible to provide online data services.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Halevi, G., Moed, H.: The Evolution of Big Data as a Research and Scientific Topic. Research Trends 30 (2012)

    Google Scholar 

  2. Eckerson, W., White, C.: Evaluating ETL and Data Integration Platforms. Report of The Data Warehousing Institute (2003)

    Google Scholar 

  3. Ferguson, M.: Offloading and Accelerating Data Warehouse ETL Processing using Hadoop. Report of Intelligent Business Strategies (2013)

    Google Scholar 

  4. Rodriguez, N., Lawson, K., Molina, E., Gutierrez, J.: Data Warehousing Tool Evaluation – ETL Focused. In: Proc. SWDSI 2012 (2012)

    Google Scholar 

  5. Liu, X., Thomsen, C., Pedersen, T.B.: ETLMR: A Highly Scalable Dimensional ETL Framework Based on Map Reduce. In: Cuzzocrea, A., Dayal, U. (eds.) DaWaK 2011. LNCS, vol. 6862, pp. 96–111. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  6. MapReduce Overview, Google App Engine, https://developers.google.com/appengine/docs/python/dataprocessing/ (accessed July 2013)

  7. Cost of ETL tools, http://enselsoftware.blogspot.in/2009/06/cost-of-etl-tools.html

  8. 5 Common Questions About Apache Hadoop Accessed (July 2013), http://blog.cloudera.com/blog/2009/05/5-common-questions-about-hadoop/

  9. Map Reduce Salary, http://www.indeed.com/ (accessed July 2013)

  10. ETL Tools Comparison, http://www.etltools.net/etl-tools-comparison.html (accessed July 2013)

  11. Friedman, T., Beyer, M.A., Thoo, E.: Magic Quadrant for Data Integration Tools, Gartner Report (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer International Publishing Switzerland

About this paper

Cite this paper

Misra, S., Saha, S.K., Mazumdar, C. (2013). Performance Comparison of Hadoop Based Tools with Commercial ETL Tools – A Case Study. In: Bhatnagar, V., Srinivasa, S. (eds) Big Data Analytics. BDA 2013. Lecture Notes in Computer Science, vol 8302. Springer, Cham. https://doi.org/10.1007/978-3-319-03689-2_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-03689-2_12

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-03688-5

  • Online ISBN: 978-3-319-03689-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics