Skip to main content

Benchmarking Performance for Migrating a Relational Application to a Parallel Implementation

  • Conference paper
Advances in Conceptual Modeling (ER 2014)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8823))

Included in the following conference series:

Abstract

Many organizations rely on relational database platforms for OLAP-style querying (aggregation and filtering) for small to medium size applications. We investigate the impact of scaling up the data sizes for such queries. We intend to illustrate what kind of performance results an organization could expect should they migrate current applications to big data environments. This paper benchmarks the performance of Hive [20], a parallel data warehouse platform that is a part of the Hadoop software stack. We set up a 4-node Hadoop cluster using Hortonworks HDP 1.3.2 [10]. We use the data generator provided by the TPC-DS benchmark [3] to generate data of different scales. We use a representative query provided in the TPC-DS query set and run the SQL and Hive Query Language (HiveQL) versions of the same query on a relational database installation (MySQL) and on the Hive cluster. We measure the speedup for query execution for all dataset sizes resulting from the scale up. Hive loads the large datasets faster than MySQL, while it is marginally slower than MySQL when loading the smaller datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Baru, C., Bhandarkar, M., Nambiar, R., Poess, M., Rabl, T.: Setting the direction for big data benchmark standards. In: Nambiar, R., Poess, M. (eds.) TPCTC 2012. LNCS, vol. 7755, pp. 197–208. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  2. Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Communications of the ACM 51(1), 107–113 (2008)

    Article  Google Scholar 

  3. DSGen v1.1.0, data generation tool for TPC-DS, http://www.tpc.org/tpcds/

  4. Ghazal, A., Rabl, T., Hu, M., Raab, F., Poess, M., Crolotte, A., Jacobsen, H.-A.: BigBench: Towards an Industry Standard Benchmark for Big Data Analytics (2013)

    Google Scholar 

  5. GridMix program. Available in Hadoop source distribution: src/benchmarks/gridmix

    Google Scholar 

  6. Gruenheid, A., Omiecinski, E., Mark, L.: Query optimization using column statistics in hive. In: Proceedings of the 15th Symposium on International Database Engineering & Applications, pp. 97–105. ACM (2011)

    Google Scholar 

  7. HadoopTeraSort program. Available in Hadoop source distribution since 0.19 version: src/examples/org/apache/hadoop/examples/terasort

    Google Scholar 

  8. Herodotou, H., Lim, H., Luo, G., Borisov, N., Dong, L., Cetin, F.B., Babu, S.: Starfish: A Self-tuning System for Big Data Analytics. In: CIDR, vol. 11, pp. 261–272 (2011)

    Google Scholar 

  9. Hive Performance Benchmark, https://issues.apache.org/jira/browse/hive-396

  10. Hortonworks HDP 1.3.2, http://hortonworks.com/products/hdp/hdp-1-3/#overview

  11. Hortonworks Stinger Initiative, http://hortonworks.com/labs/stinger/

  12. Huang, S., Huang, J., Dai, J., Xie, T., Huang, B.: The HiBench benchmark suite: Characterization of the MapReduce-based data analysis. In: 2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW), pp. 41–51. IEEE (2010)

    Google Scholar 

  13. Nambiar, R.O., Poess, M.: The making of TPC-DS. In: Proceedings of the 32nd International Conference on Very Large Data Bases, pp. 1049–1058. VLDB Endowment (2006)

    Google Scholar 

  14. Pansare, N., Cai, Z.: Using Hive to perform medium-scale data analysis (2010)

    Google Scholar 

  15. Pavlo, A., Paulson, E., Rasin, A., Abadi, D.J., DeWitt, D.J., Madden, S., Stonebraker, M.: A comparison of approaches to large-scale data analysis. In: Proceedings of the 2009 ACM SIGMOD International Conference on Management of Data, pp. 165–178. ACM (2009)

    Google Scholar 

  16. Running the TPC-H benchmark on Hive, https://issues.apache.org/jira/secure/attachment/12416257/TPC-H_on_Hive_2009-08-11.pdf

  17. Sort program. Available in Hadoop source distribution: src/examples/org/apache/hadoop/examples/sort

    Google Scholar 

  18. Shi, Y., Meng, X., Zhao, J., Hu, X., Liu, B., Wang, H.: Benchmarking cloud-based data management systems. In: Proceedings of the Second International Workshop on Cloud Data Management, pp. 47–54. ACM (2010)

    Google Scholar 

  19. TPC-DS benchmarking standard, http://www.tpc.org/tpcds/spec/tpcds_1.1.0.pdf

  20. Thusoo, A., Sarma, J.S., Jain, N., Shao, Z., Chakka, P., Anthony, S., Liu, H., Wyckoff, P., Murthy, R.: Hive: A warehousing solution over a map-reduce framework. Proceedings of the VLDB Endowment 2(2), 1626–1629 (2009)

    Article  Google Scholar 

  21. White, T.: Hadoop: The definitive guide. O’Reilly (2012)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Gadiraju, K.K., Davis, K.C., Talaga, P.G. (2014). Benchmarking Performance for Migrating a Relational Application to a Parallel Implementation. In: Indulska, M., Purao, S. (eds) Advances in Conceptual Modeling. ER 2014. Lecture Notes in Computer Science, vol 8823. Springer, Cham. https://doi.org/10.1007/978-3-319-12256-4_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-12256-4_6

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-12255-7

  • Online ISBN: 978-3-319-12256-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics