Abstract
The processing of structured data in Hadoop is achieved by Hive, a data warehouse tool. It is present on top of Hadoop and helps to analyze, query, and review the Big Data. The execution time of the queries has drastically reduced by using Hadoop MapReduce. This paper presents the detailed comparison of various optimizing techniques for data models like partitioning and bucket methods to improve the processing time for Hive queries. The implementation is done on data from New York Police Portal using AWS services for storage. Hive tool in Hadoop ecosystem is used for querying data. Use of partitioning has shown remarkable improvement in terms of execution time.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Pen, H.D., Dsilva, P., Mascarnes, S.: Comparing HiveQL and MapReduce methods to process fact data in a data warehouse. In: 2017 2nd International Conference on Communication Systems, Computing and IT Applications (CSCITA), pp. 201–206. IEEE (2017, April)
Oussous, A., Benjelloun, F.Z., Lahcen, A.A., Belfkih, S.: Big data technologies: a survey. J. King Saud Univ. Comput. Inf. Sci. (2017)
Shaw, S., Vermeulen, A.F., Gupta, A., Kjerrumgaard, D.: Hive architecture. In: Practical Hive, pp. 37–48. Apress, Berkeley, CA (2016)
Sakr, S.: Big data 2.0 processing systems: a survey. Springer International Publishing (2016)
Bansal, H., Chauhan, S., Mehrotra, S.: Apache Hive Cookbook. Packt Publishing Ltd (2016)
Loganathan, A., Sinha, A., Muthuramakrishnan, V., Natarajan, S.: A systematic approach to big data. Int. J. Inf. Comput. Technol. 4(9), 869–878 (2014)
Zikopoulos, P., Eaton, C.: Understanding big data: analytics for enterprise class Hadoop and streaming data. McGraw-Hill Osborne Media (2010)
Usha, D., Jenil, A.: A survey of big data processing in perspective of Hadoop and MapReduce. Int. J. Curr. Eng. Technol. 4(2), 602–606 (2014)
Elgazzar, K., Martin, P., Hassanein, H.S.: Cloud-assisted computation offloading to support mobile services. IEEE Trans. Cloud Comput. 4(3), 279–292 (2016)
Coronel, C., Morris, S.: Database systems: design, implementation, & management. Cengage Learning (2016)
Lydia, E.L., Swarup, M.B.: Big data analysis using Hadoop components like Flume, MapReduce, Pig and Hive. Int. J. Sci. Eng. Comput. Technol. 5(11), 390 (2015)
Vohra, D.: Using Apache Sqoop. In: Pro Docker, pp. 151–183. Apress, Berkeley, CA (2016)
Hoffman, S.: Apache Flume: Distributed Log Collection for Hadoop. Packt Publishing Ltd (2015)
Shireesha, R., Bhutada, S.: A study of tools, techniques, and trends for big data analytics. IJACTA 4(1), 152–158 (2015)
Mazumder, S.: Big data tools and platforms. In Big Data Concepts, Theories, and Applications, pp. 29–128. Springer, Cham (2016)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Sharma, M., Kaur, J. (2020). Performance Analysis of Queries with Hive Optimized Data Models. In: Singh, P., Kar, A., Singh, Y., Kolekar, M., Tanwar, S. (eds) Proceedings of ICRIC 2019 . Lecture Notes in Electrical Engineering, vol 597. Springer, Cham. https://doi.org/10.1007/978-3-030-29407-6_49
Download citation
DOI: https://doi.org/10.1007/978-3-030-29407-6_49
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-29406-9
Online ISBN: 978-3-030-29407-6
eBook Packages: EngineeringEngineering (R0)