Abstract
Data analysis is one of the essential business needs of organizations to optimize performance. The data is loaded into data warehouse (DWH) using Extract, Transform and Load (ETL). Analytics is run on the DWH. The largest cost and execution time is associated with the ET part of this workflow. Recent approaches based on Hadoop, an open source Apache framework for data intensive scalable computing, provide an alternative for ET which is both cheaper and faster than commercial prevalent ETL tools. This paper presents a case study where experimental metric results have been presented in support of the claim. The reduction of cost makes it viable for small and large organizations alike and reduction in execution time makes it possible to provide online data services.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Halevi, G., Moed, H.: The Evolution of Big Data as a Research and Scientific Topic. Research Trends 30 (2012)
Eckerson, W., White, C.: Evaluating ETL and Data Integration Platforms. Report of The Data Warehousing Institute (2003)
Ferguson, M.: Offloading and Accelerating Data Warehouse ETL Processing using Hadoop. Report of Intelligent Business Strategies (2013)
Rodriguez, N., Lawson, K., Molina, E., Gutierrez, J.: Data Warehousing Tool Evaluation – ETL Focused. In: Proc. SWDSI 2012 (2012)
Liu, X., Thomsen, C., Pedersen, T.B.: ETLMR: A Highly Scalable Dimensional ETL Framework Based on Map Reduce. In: Cuzzocrea, A., Dayal, U. (eds.) DaWaK 2011. LNCS, vol. 6862, pp. 96–111. Springer, Heidelberg (2011)
MapReduce Overview, Google App Engine, https://developers.google.com/appengine/docs/python/dataprocessing/ (accessed July 2013)
Cost of ETL tools, http://enselsoftware.blogspot.in/2009/06/cost-of-etl-tools.html
5 Common Questions About Apache Hadoop Accessed (July 2013), http://blog.cloudera.com/blog/2009/05/5-common-questions-about-hadoop/
Map Reduce Salary, http://www.indeed.com/ (accessed July 2013)
ETL Tools Comparison, http://www.etltools.net/etl-tools-comparison.html (accessed July 2013)
Friedman, T., Beyer, M.A., Thoo, E.: Magic Quadrant for Data Integration Tools, Gartner Report (2010)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer International Publishing Switzerland
About this paper
Cite this paper
Misra, S., Saha, S.K., Mazumdar, C. (2013). Performance Comparison of Hadoop Based Tools with Commercial ETL Tools – A Case Study. In: Bhatnagar, V., Srinivasa, S. (eds) Big Data Analytics. BDA 2013. Lecture Notes in Computer Science, vol 8302. Springer, Cham. https://doi.org/10.1007/978-3-319-03689-2_12
Download citation
DOI: https://doi.org/10.1007/978-3-319-03689-2_12
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-03688-5
Online ISBN: 978-3-319-03689-2
eBook Packages: Computer ScienceComputer Science (R0)