Skip to main content

Performance Analysis of Queries with Hive Optimized Data Models

  • Conference paper
  • First Online:
Proceedings of ICRIC 2019

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 597))

Abstract

The processing of structured data in Hadoop is achieved by Hive, a data warehouse tool. It is present on top of Hadoop and helps to analyze, query, and review the Big Data. The execution time of the queries has drastically reduced by using Hadoop MapReduce. This paper presents the detailed comparison of various optimizing techniques for data models like partitioning and bucket methods to improve the processing time for Hive queries. The implementation is done on data from New York Police Portal using AWS services for storage. Hive tool in Hadoop ecosystem is used for querying data. Use of partitioning has shown remarkable improvement in terms of execution time.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Pen, H.D., Dsilva, P., Mascarnes, S.: Comparing HiveQL and MapReduce methods to process fact data in a data warehouse. In: 2017 2nd International Conference on Communication Systems, Computing and IT Applications (CSCITA), pp. 201–206. IEEE (2017, April)

    Google Scholar 

  2. Oussous, A., Benjelloun, F.Z., Lahcen, A.A., Belfkih, S.: Big data technologies: a survey. J. King Saud Univ. Comput. Inf. Sci. (2017)

    Google Scholar 

  3. Shaw, S., Vermeulen, A.F., Gupta, A., Kjerrumgaard, D.: Hive architecture. In: Practical Hive, pp. 37–48. Apress, Berkeley, CA (2016)

    Google Scholar 

  4. Sakr, S.: Big data 2.0 processing systems: a survey. Springer International Publishing (2016)

    Google Scholar 

  5. Bansal, H., Chauhan, S., Mehrotra, S.: Apache Hive Cookbook. Packt Publishing Ltd (2016)

    Google Scholar 

  6. Loganathan, A., Sinha, A., Muthuramakrishnan, V., Natarajan, S.: A systematic approach to big data. Int. J. Inf. Comput. Technol. 4(9), 869–878 (2014)

    Google Scholar 

  7. Zikopoulos, P., Eaton, C.: Understanding big data: analytics for enterprise class Hadoop and streaming data. McGraw-Hill Osborne Media (2010)

    Google Scholar 

  8. Usha, D., Jenil, A.: A survey of big data processing in perspective of Hadoop and MapReduce. Int. J. Curr. Eng. Technol. 4(2), 602–606 (2014)

    Google Scholar 

  9. Elgazzar, K., Martin, P., Hassanein, H.S.: Cloud-assisted computation offloading to support mobile services. IEEE Trans. Cloud Comput. 4(3), 279–292 (2016)

    Article  Google Scholar 

  10. Coronel, C., Morris, S.: Database systems: design, implementation, & management. Cengage Learning (2016)

    Google Scholar 

  11. Lydia, E.L., Swarup, M.B.: Big data analysis using Hadoop components like Flume, MapReduce, Pig and Hive. Int. J. Sci. Eng. Comput. Technol. 5(11), 390 (2015)

    Google Scholar 

  12. Vohra, D.: Using Apache Sqoop. In: Pro Docker, pp. 151–183. Apress, Berkeley, CA (2016)

    Google Scholar 

  13. Hoffman, S.: Apache Flume: Distributed Log Collection for Hadoop. Packt Publishing Ltd (2015)

    Google Scholar 

  14. Shireesha, R., Bhutada, S.: A study of tools, techniques, and trends for big data analytics. IJACTA 4(1), 152–158 (2015)

    Google Scholar 

  15. Mazumder, S.: Big data tools and platforms. In Big Data Concepts, Theories, and Applications, pp. 29–128. Springer, Cham (2016)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Meghna Sharma .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Sharma, M., Kaur, J. (2020). Performance Analysis of Queries with Hive Optimized Data Models. In: Singh, P., Kar, A., Singh, Y., Kolekar, M., Tanwar, S. (eds) Proceedings of ICRIC 2019 . Lecture Notes in Electrical Engineering, vol 597. Springer, Cham. https://doi.org/10.1007/978-3-030-29407-6_49

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-29407-6_49

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-29406-9

  • Online ISBN: 978-3-030-29407-6

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics