Skip to main content

PRIMEBALL: A Parallel Processing Framework Benchmark for Big Data Applications in the Cloud

  • Conference paper
Performance Characterization and Benchmarking (TPCTC 2013)

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 8391))

Included in the following conference series:

Abstract

In this position paper, we draw the specifications for a novel benchmark for comparing parallel processing frameworks in the context of big data applications hosted in the cloud. We aim at filling several gaps in already existing cloud data processing benchmarks, which lack a real-life context for their processes, thus losing relevance when trying to assess performance for real applications. Hence, we propose a fictitious news site hosted in the cloud that is to be managed by the framework under analysis, together with several objective use case scenarios and measures for evaluating system performance. The main strengths of our benchmark definition are parallelization capabilities supporting cloud features and big data properties.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. IBM, What is big data? (2012), http://www-01.ibm.com/software/data/bigdata/

  2. Sato, K.: An Inside Look at Google BigQuery, White paper (2012), https://cloud.google.com/files/BigQueryTechnicalWP.pdf

  3. Shvachko, K., Kuang, H., Radia, S., Chansler, R.: The Hadoop Distributed File System. In: 26th IEEE Symposium on Mass Storage Systems and Technologies (MSST 2010), Incline Village, USA, pp. 1–10 (2010)

    Google Scholar 

  4. Folkerts, E., Alexandrov, A., Sachs, K., Iosup, A., Markl, V., Tosun, C.: Benchmarking in the Cloud: What it Should, Can, and Cannot Be. In: Nambiar, R., Poess, M. (eds.) TPCTC 2012. LNCS, vol. 7755, pp. 173–188. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  5. Transaction Processing Performance Council (TPC), TPC Benchmark DS Standard Specification Version 1.1.0 (2012), http://www.tpc.org

  6. Open Cloud Consortium, Generate synthetic site-entity log data for testing and benchmarking applications requiring large data sets (2009), http://code.google.com/p/malgen/

  7. Cloud Harmony (2013), http://www.cloudharmony.com/benchmarks

  8. Cooper, B.F., Silberstein, A., Tam, E., Ramakrishnan, R., Sears, S.: Benchmarking cloud serving systems with YCSB. In: 1st ACM Symposium on Cloud Computing (SoCC 2010), Indianapolis, USA, pp. 143–154 (2010)

    Google Scholar 

  9. DeCandia, G., Hastorun, D., Jampani, M., Kakulapati, G., Lakshman, A., Pilchin, A., Sivasubramanian, S., Vosshall, P., Vogels, W.: Dynamo: amazon’s highly available key-value store. In: 21st ACM SIGOPS Symposium on Operating Systems Principles (SOSP 2007), pp. 205–220 (2007)

    Google Scholar 

  10. Chen, Y., Alspaugh, S., Ganapathi, A., Griffith, R., KatzThe, R.: Statistical Workload Injector for MapReduce (SWIM) (2013), https://github.com/SWIMProjectUCB/SWIM/wiki

  11. Juang, B.H., Rabiner, L.R.: Hidden Markov models for speech recognition. Technometrics 33(3), 251–272 (1991)

    Article  MATH  MathSciNet  Google Scholar 

  12. Xu, M., Liang, H., Xin, L.: A Refined TF-IDF Algorithm Based on Channel Distribution Information for Web News Feature Extraction. In: Second International Workshop on Education Technology and Computer Science (ETCS 2010), Wuhan, China, vol. 2, pp. 15–19 (2010)

    Google Scholar 

  13. Wing, W., Ghorbani, A.A.: Weighted pagerank algorithm. In: Second Annual Conference on Communication Networks and Services Research (CNSR 2004), Fredericton, Canada, pp. 305–314 (2004)

    Google Scholar 

  14. Newman, D., Asuncion, A., Smyth, P., Welling, M.: Distributed algorithms for topic models. The Journal of Machine Learning Research 10, 1801–1828 (2009)

    MATH  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Ferrarons, J., Adhana, M., Colmenares, C., Pietrowska, S., Bentayeb, F., Darmont, J. (2014). PRIMEBALL: A Parallel Processing Framework Benchmark for Big Data Applications in the Cloud. In: Nambiar, R., Poess, M. (eds) Performance Characterization and Benchmarking. TPCTC 2013. Lecture Notes in Computer Science, vol 8391. Springer, Cham. https://doi.org/10.1007/978-3-319-04936-6_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-04936-6_8

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-04935-9

  • Online ISBN: 978-3-319-04936-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics