Encyclopedia of Big Data Technologies

2019 Edition
| Editors: Sherif Sakr, Albert Y. Zomaya

Hive

  • Alan F. GatesEmail author
Reference work entry
DOI: https://doi.org/10.1007/978-3-319-77525-8_250

Definitions

Apache Hive is data warehousing software that facilitates reading, writing, and managing large data sets residing in distributed storage using SQL (Apache Hive PMC 2017).

Overview

Hive enables data warehousing in the Apache Hadoop ecosystem. It can run in traditional Hadoop clusters or in cloud environments. It can work with data sets as large as multiple petabytes. Initially Hive was used mainly for ETL and batch processing. While still supporting these use cases, it has evolved to also support data warehousing use cases such as reporting, interactive queries, and business intelligence. This evolution has been accomplished by adopting many common data warehousing techniques while adapting those techniques to the Hadoop ecosystem. It is implemented in Java.

Architecture

Hive’s architecture is shown in figure 1. Not all of the components in the diagram are required in every installation. LLAP and HiveServer2 are optional; the Metastore can be run embedded in HiveServer2 or...
This is a preview of subscription content, log in to check access.

References

  1. Apache Hive (2017) http://hive.apache.org/
  2. Apache Hive SQL Conformance (2017) https://cwiki.apache.org/confluence/display/Hive/Apache+Hive+SQL+Conformance. Accessed 11 Nov 2017
  3. Boncz P et al (2005) MonetDB/X100: hyper-pipelining query execution. In: Proceedings of the 2005 CIDR conference, Asilomar, pp 225–237Google Scholar
  4. Huai Y et al (2014) Major technical advancements in Apache Hive. In: Proceedings of the 2014 ACM SIGMOD international conference on management of data, Snowbird Utah, 22–27 June 2014Google Scholar
  5. Saha B et al (2015) Apache Tez: a unifying framework for modeling and building data processing applications. In: Proceedings of the 2015 ACM SIGMOD international conference on management of data, Melbourne, 31 May–4 June 2015Google Scholar
  6. Shanklin C (2014) Benchmarking Apache Hive 13 for enterprise Hadoop. https://hortonworks.com/blog/benchmarking-apache-hive-13-enterprise-hadoop/. Accessed 9 Nov 2017
  7. Vavilapalli V et al (2013) Apache Hadoop YARN: yet another resource negotiator. In: Proceedings of the 4th annual symposium on cloud computing, Santa Clara, 1–3 Oct 2013Google Scholar
  8. Zaharia M (2010) Spark: cluster computing with working sets. In: Proceedings of the 2nd USENIX conference on hot topics in cloud computing, Boston, 22–25 June 2010Google Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2019

Authors and Affiliations

  1. 1.HortonworksSanta ClaraUSA

Section editors and affiliations

  • Yuanyuan Tian
    • 1
  • Fatma Özcan
    • 1
  1. 1.IBM Almaden Research CenterSAN JOSEUSA