Abstract
As Hadoop-based big data framework grows in pervasiveness and scale, realistically benchmarking Hadoop systems becomes critically important to the Hadoop community and industry. In this paper, we present our experience of Hadoop benchmarking with HiBench (an open source Hadoop benchmark suite widely used by Hadoop users), and introduce our recent work on advanced end-to-end ETL-recommendation pipelines based on our experience.
Jinquan Dai: This work was done when the author was working in Intel Asia-Pacific Research and Development Ltd.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Huang, S., Huang, J., Dai, J., Xie, T., Huang, B.: The HiBench benchmark suite: characterization of the MapReduce-based data analysis. In: ICDEW, Hibench, March 2010
HiBench Homepage. https://github.com/intel-hadoop/HiBench
Nutch homepage. http://lucene.apache.org/nutch/
Pegasus Homepage. http://pegasus.isi.edu/
A Benchmark for Hive, PIG and Hadoop. http://issues.apache.org/jira/browse/HIVE-396
Pavlo, A., Rasin, A., Madden, S., Stonebraker, M., DeWitt, D., Paulson, E., Shrinivas, L., Abadi, D.J.: A comparison of approaches to large-scale data analysis. In: SIGMOD, June 2009
GridMix3. http://hadoop.apache.org/mapreduce/docs/current/gridmix.html
Chen, Y., Ganapathi, A., Griffith, R., Katz. R.: The case for evaluating MapReduce performance using workload suites. In: MASCOTS (2011)
TPC Benchmark DS (2012)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Yi, L., Dai, J. (2014). Experience from Hadoop Benchmarking with HiBench: From Micro-Benchmarks Toward End-to-End Pipelines. In: Rabl, T., Raghunath, N., Poess, M., Bhandarkar, M., Jacobsen, HA., Baru, C. (eds) Advancing Big Data Benchmarks. WBDB WBDB 2013 2013. Lecture Notes in Computer Science(), vol 8585. Springer, Cham. https://doi.org/10.1007/978-3-319-10596-3_4
Download citation
DOI: https://doi.org/10.1007/978-3-319-10596-3_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-10595-6
Online ISBN: 978-3-319-10596-3
eBook Packages: Computer ScienceComputer Science (R0)