Abstract
The Apache Hadoop is an open-source project which allows for the distributed processing of huge data sets across clusters of computers using simple programming models. It is designed to handle massive amounts of data and has the ability to store, analyze, and access large amounts of data quickly, across clusters of commodity hardware. Hadoop has several large-scale data processing tools and each has its own purpose. The Hadoop ecosystem has emerged as a cost-effective way of working with large data sets. It imposes a particular programming model, called MapReduce, for breaking up computation tasks into units that can be distributed around a cluster of commodity and server class hardware and thereby providing cost-effective horizontal scalability. This chapter provides the introductory material about the various Hadoop ecosystem tools and describes their usage with data analytics. Each tool has its own significance in its functions in data analytics environment.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Intel (2013) White paper: extract, transform and load Big data with Hadoop. Available at: hadoop.intel.com. Accessed 30 July 2015
Ashish et al (2010) Hive – a Petabyte scale data warehouse using hadoop. IEEE International Conference on Data Engineering, November 2010
Edward C et al (2012) Programming Hive. O’Reilly Media Inc, Sebastopol
Apache (2014) Language manual. Available at: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-Overview. Accessed 10 May 2014
Tutorials point (2013) Hive partitioning. Available at: http://www.tutorialspoint.com/Hive/Hive_partitioning.html. Accessed 15 June 2014
Rohit R (2014) Introduction to Hive’s partitioning. Available at: http://java.dzone.com/articles/introduction-Hives. Accessed 25 Jan 2015
Peschka J (2013) Introduction to Hive partitioning. Available at: http://www.brentozar.com/archive/2013/03/introduction-to-hive-partitioning/. Accessed 8 Aug 2015
Thrive school (2013) Available at: http://thriveschool.blogspot.in/2013/11/Hive-bucketed-tables-and-sampling.html. Accessed 10 Jan 2015
Philip N (2014) 10 best practices for Apache Hive.Available at: www.qubole.com/blog/big-data/hive-best-practices. Accessed 15 July 2015
Petit W (2014) Introduction to Pig. Available at: http://bigdatauniversity.com/bdu-wp/bdu-course/introduction-to-pig/#sthash.HUcw7EZe.dpuf. Accessed 20 June 2014
Apache (2014) Hadoop online tutorial. Available at: http://hadooptutorial.info/tag/hadoop-pig-architecture-explanation. Accessed 13 Feb 2015
Hadoop (2010) Pig latin manual. Available at: https://pig.apache.org/docs/r0.7.0/piglatin_ref2.html. Accessed 20 May 2014
Lam C (2010) Hadoop in action. Manning Publications, Greenwich
Gates A (2011) Programming Pig. O’Reilly Media Inc, Sebastopol
Apache (2007) Getting started-Pig, Apache Software Foundation
Apache (2015) When would I use Apache HBase. Available at: Hbase.apache.org. Accessed 10 Feb 2015
Grehan R (2014) Review: HBase is massively scalable – and hugely complex. Available at: http://www.infoworld.com/article/2610709/database/review--hbase-is-massively-scalable----and-hugely-complex.html. Accessed 10 July 2015
Servelets C (2012) HBase overview. Available at: www.coreservlets.com/hadoop-tutorial/#HBase. Accessed 12 Jan. 2015
Apache (2010) Apache Zookeeper. Available at: zookeeper.apache.org. Accessed 15 March 2015
Tutorials Point (2014) HBase tutorial. Available at: Tutorialspoint.com/hbase. Accessed 18 Feb 2015
George L (2011) HBase definitive guide. O’Reilly Media Inc, Sebastopol
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Maheswari, N., Sivagami, M. (2016). Large-Scale Data Analytics Tools: Apache Hive, Pig, and HBase. In: Mahmood, Z. (eds) Data Science and Big Data Computing. Springer, Cham. https://doi.org/10.1007/978-3-319-31861-5_9
Download citation
DOI: https://doi.org/10.1007/978-3-319-31861-5_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-31859-2
Online ISBN: 978-3-319-31861-5
eBook Packages: Computer ScienceComputer Science (R0)