Large-Scale Data Analytics Tools: Apache Hive, Pig, and HBase

Maheswari, N.; Sivagami, M.

doi:10.1007/978-3-319-31861-5_9

N. Maheswari² &
M. Sivagami²

4564 Accesses
1 Citations

Abstract

The Apache Hadoop is an open-source project which allows for the distributed processing of huge data sets across clusters of computers using simple programming models. It is designed to handle massive amounts of data and has the ability to store, analyze, and access large amounts of data quickly, across clusters of commodity hardware. Hadoop has several large-scale data processing tools and each has its own purpose. The Hadoop ecosystem has emerged as a cost-effective way of working with large data sets. It imposes a particular programming model, called MapReduce, for breaking up computation tasks into units that can be distributed around a cluster of commodity and server class hardware and thereby providing cost-effective horizontal scalability. This chapter provides the introductory material about the various Hadoop ecosystem tools and describes their usage with data analytics. Each tool has its own significance in its functions in data analytics environment.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 109.00; Price excludes VAT (USA)

Softcover Book: USD 139.99; Price excludes VAT (USA)

Hardcover Book: USD 139.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Intel (2013) White paper: extract, transform and load Big data with Hadoop. Available at: hadoop.intel.com. Accessed 30 July 2015
Ashish et al (2010) Hive – a Petabyte scale data warehouse using hadoop. IEEE International Conference on Data Engineering, November 2010
Google Scholar
Edward C et al (2012) Programming Hive. O’Reilly Media Inc, Sebastopol
Google Scholar
Apache (2014) Language manual. Available at: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-Overview. Accessed 10 May 2014
Tutorials point (2013) Hive partitioning. Available at: http://www.tutorialspoint.com/Hive/Hive_partitioning.html. Accessed 15 June 2014
Rohit R (2014) Introduction to Hive’s partitioning. Available at: http://java.dzone.com/articles/introduction-Hives. Accessed 25 Jan 2015
Peschka J (2013) Introduction to Hive partitioning. Available at: http://www.brentozar.com/archive/2013/03/introduction-to-hive-partitioning/. Accessed 8 Aug 2015
Thrive school (2013) Available at: http://thriveschool.blogspot.in/2013/11/Hive-bucketed-tables-and-sampling.html. Accessed 10 Jan 2015
Philip N (2014) 10 best practices for Apache Hive.Available at: www.qubole.com/blog/big-data/hive-best-practices. Accessed 15 July 2015
Petit W (2014) Introduction to Pig. Available at: http://bigdatauniversity.com/bdu-wp/bdu-course/introduction-to-pig/#sthash.HUcw7EZe.dpuf. Accessed 20 June 2014
Apache (2014) Hadoop online tutorial. Available at: http://hadooptutorial.info/tag/hadoop-pig-architecture-explanation. Accessed 13 Feb 2015
Hadoop (2010) Pig latin manual. Available at: https://pig.apache.org/docs/r0.7.0/piglatin_ref2.html. Accessed 20 May 2014
Lam C (2010) Hadoop in action. Manning Publications, Greenwich
Google Scholar
Gates A (2011) Programming Pig. O’Reilly Media Inc, Sebastopol
Google Scholar
Apache (2007) Getting started-Pig, Apache Software Foundation
Google Scholar
Apache (2015) When would I use Apache HBase. Available at: Hbase.apache.org. Accessed 10 Feb 2015
Grehan R (2014) Review: HBase is massively scalable – and hugely complex. Available at: http://www.infoworld.com/article/2610709/database/review--hbase-is-massively-scalable----and-hugely-complex.html. Accessed 10 July 2015
Servelets C (2012) HBase overview. Available at: www.coreservlets.com/hadoop-tutorial/#HBase. Accessed 12 Jan. 2015
Apache (2010) Apache Zookeeper. Available at: zookeeper.apache.org. Accessed 15 March 2015
Tutorials Point (2014) HBase tutorial. Available at: Tutorialspoint.com/hbase. Accessed 18 Feb 2015
George L (2011) HBase definitive guide. O’Reilly Media Inc, Sebastopol
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computing Science and Engineering, VIT University, Vandalur-Kelambakkam Road, 600 127, Chennai, Tamil Nadu, India
N. Maheswari & M. Sivagami

Authors

N. Maheswari
View author publications
You can also search for this author in PubMed Google Scholar
M. Sivagami
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to N. Maheswari .

Editor information

Editors and Affiliations

Department of Computing and Mathematics , University of Derby, Derby, United Kingdom
Zaigham Mahmood

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Maheswari, N., Sivagami, M. (2016). Large-Scale Data Analytics Tools: Apache Hive, Pig, and HBase. In: Mahmood, Z. (eds) Data Science and Big Data Computing. Springer, Cham. https://doi.org/10.1007/978-3-319-31861-5_9

Download citation

DOI: https://doi.org/10.1007/978-3-319-31861-5_9
Published: 06 July 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-31859-2
Online ISBN: 978-3-319-31861-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics