Advertisement

Big Data Workloads Drawn from Real-Time Analytics Scenarios Across Three Deployed Solutions

  • Tao ZhongEmail author
  • Kshitij Doshi
  • Xi Tang
  • Ting Lou
  • Zhongyan Lu
  • Hong Li
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8585)

Abstract

Big Data solution vendors and customers alike face a pressing need for a few credible benchmarking workloads for demonstrating or optimizing performance, elasticity, efficiency, and robustness of solutions they create or deploy. Many new problems require extraction of immediately actionable intelligence from torrents of data, so a good application level benchmark must reflect in its design both real-time (low latency) and high throughput metrics. It should also impose loads that reflect the realities of complex, interdependent mixes of storage and analysis operations. This short paper describes three different application level scenarios. In these scenarios Big Data solutions are used to generate answers in real time for a subset of requests while requests that do not require such real time responses are completed at high rate in the background in presence of massive inflows of new data. The solutions from which we draw these scenarios are already in deployment or in pre-deployment testing, and thus can serve as good models from which to draw design perspectives in assembling a realistic Big Data workload, meaningful to customers tackling real-time needs while balancing high availability and service rate requirements.

Keywords

Real-time analytics Data processing Performance Latency Transactions Workload Benchmark Databases 

References

  1. 1.
    UCSD Center for Large Data Systems Research, Workshops on Big Data Benchmarking. http://clds.ucsd.edu/bdbc/workshops
  2. 2.
    HBase Web page. http://hbase.apache.org
  3. 3.
    Hadoop Web page. http://hadoop.apache.org
  4. 4.
    TeraSort. Web page. http://sortbenchmark.org
  5. 5.
    Huang, S., Huang, J., Dai, J., Xie, T., Huang, B.: The HiBench benchmark suite: characterization of the mapreduce-based data analysis. In: IEEE 26th International Conference on Data Engineering Workshops (ICDEW) (2010)Google Scholar
  6. 6.
    Cooper, B.F., Silberstein, A., Tam, E., Ramakrishnan, R., Sears, R.: Benchmarking cloud serving systems with YCSB. In: Proceedings of the 1st ACM Symposium Cloud Computing, pp. 143–154 (2010)Google Scholar
  7. 7.
    Cassandra Web page. http://cassandra.apache.org
  8. 8.
    MongoDB Web page. www.mongodb.org
  9. 9.
    Chen, Y., Alspaugh, S., Ganapathi, A., Griffith, R., Katz, R.: Statistical Workload Injector for MapReduce (SWIM). https://github.com/SWIMProjectUCB/SWIM/wiki
  10. 10.
  11. 11.
    Ghazal, A., Rabl, T., Hu, M., Raab, F., Poess, M., Crolotte, A., Jacobsen, H.: BigBench: towards an industry standard benchmark for big data analytics. In: Proceedings of the ACM SIGMOD, June 2013Google Scholar
  12. 12.
    Doshi, K.A., Zhong, T., Lu, Z., Tang, X., Lou, T., Deng, G.: Blending SQL and NewSQL approaches: reference architectures for enterprise big data challenges. In: IEEE Big Data Workshop, CyberC (2013)Google Scholar
  13. 13.
    Zhong, T., Doshi, K.A., Tang, X., Lou, T., Lu, Z., Li, H.: On mixing high-speed updates and in-memory queries: a big-data architecture for real-time analytics. In: IEEE BPOE Workshop (2013)Google Scholar
  14. 14.
  15. 15.
    Redis Web page. http://redis.io/
  16. 16.
    Hive Web page. http://hive.apache.org
  17. 17.
  18. 18.

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Tao Zhong
    • 1
    Email author
  • Kshitij Doshi
    • 1
  • Xi Tang
    • 1
  • Ting Lou
    • 1
  • Zhongyan Lu
    • 1
  • Hong Li
    • 1
  1. 1.Software and Services Group, IntelBeijingChina

Personalised recommendations