PRIMEBALL: A Parallel Processing Framework Benchmark for Big Data Applications in the Cloud

  • Jaume Ferrarons
  • Mulu Adhana
  • Carlos Colmenares
  • Sandra Pietrowska
  • Fadila Bentayeb
  • Jérôme Darmont
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8391)

Abstract

In this position paper, we draw the specifications for a novel benchmark for comparing parallel processing frameworks in the context of big data applications hosted in the cloud. We aim at filling several gaps in already existing cloud data processing benchmarks, which lack a real-life context for their processes, thus losing relevance when trying to assess performance for real applications. Hence, we propose a fictitious news site hosted in the cloud that is to be managed by the framework under analysis, together with several objective use case scenarios and measures for evaluating system performance. The main strengths of our benchmark definition are parallelization capabilities supporting cloud features and big data properties.

Keywords

Benchmark Cloud Computing Parallel Processing Framework Big Data Real Data 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    IBM, What is big data? (2012), http://www-01.ibm.com/software/data/bigdata/
  2. 2.
    Sato, K.: An Inside Look at Google BigQuery, White paper (2012), https://cloud.google.com/files/BigQueryTechnicalWP.pdf
  3. 3.
    Shvachko, K., Kuang, H., Radia, S., Chansler, R.: The Hadoop Distributed File System. In: 26th IEEE Symposium on Mass Storage Systems and Technologies (MSST 2010), Incline Village, USA, pp. 1–10 (2010)Google Scholar
  4. 4.
    Folkerts, E., Alexandrov, A., Sachs, K., Iosup, A., Markl, V., Tosun, C.: Benchmarking in the Cloud: What it Should, Can, and Cannot Be. In: Nambiar, R., Poess, M. (eds.) TPCTC 2012. LNCS, vol. 7755, pp. 173–188. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  5. 5.
    Transaction Processing Performance Council (TPC), TPC Benchmark DS Standard Specification Version 1.1.0 (2012), http://www.tpc.org
  6. 6.
    Open Cloud Consortium, Generate synthetic site-entity log data for testing and benchmarking applications requiring large data sets (2009), http://code.google.com/p/malgen/
  7. 7.
  8. 8.
    Cooper, B.F., Silberstein, A., Tam, E., Ramakrishnan, R., Sears, S.: Benchmarking cloud serving systems with YCSB. In: 1st ACM Symposium on Cloud Computing (SoCC 2010), Indianapolis, USA, pp. 143–154 (2010)Google Scholar
  9. 9.
    DeCandia, G., Hastorun, D., Jampani, M., Kakulapati, G., Lakshman, A., Pilchin, A., Sivasubramanian, S., Vosshall, P., Vogels, W.: Dynamo: amazon’s highly available key-value store. In: 21st ACM SIGOPS Symposium on Operating Systems Principles (SOSP 2007), pp. 205–220 (2007)Google Scholar
  10. 10.
    Chen, Y., Alspaugh, S., Ganapathi, A., Griffith, R., KatzThe, R.: Statistical Workload Injector for MapReduce (SWIM) (2013), https://github.com/SWIMProjectUCB/SWIM/wiki
  11. 11.
    Juang, B.H., Rabiner, L.R.: Hidden Markov models for speech recognition. Technometrics 33(3), 251–272 (1991)CrossRefMATHMathSciNetGoogle Scholar
  12. 12.
    Xu, M., Liang, H., Xin, L.: A Refined TF-IDF Algorithm Based on Channel Distribution Information for Web News Feature Extraction. In: Second International Workshop on Education Technology and Computer Science (ETCS 2010), Wuhan, China, vol. 2, pp. 15–19 (2010)Google Scholar
  13. 13.
    Wing, W., Ghorbani, A.A.: Weighted pagerank algorithm. In: Second Annual Conference on Communication Networks and Services Research (CNSR 2004), Fredericton, Canada, pp. 305–314 (2004)Google Scholar
  14. 14.
    Newman, D., Asuncion, A., Smyth, P., Welling, M.: Distributed algorithms for topic models. The Journal of Machine Learning Research 10, 1801–1828 (2009)MATHMathSciNetGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Jaume Ferrarons
    • 1
  • Mulu Adhana
    • 1
  • Carlos Colmenares
    • 1
  • Sandra Pietrowska
    • 1
  • Fadila Bentayeb
    • 1
  • Jérôme Darmont
    • 1
  1. 1.Université de Lyon (Laboratoire ERIC) Université Lumière Lyon 2Bron CedexFrance

Personalised recommendations