D-SPACE4Cloud: A Design Tool for Big Data Applications

  • Michele Ciavotta
  • Eugenio Gianniti
  • Danilo Ardagna
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10048)

Abstract

The last years have seen a steep rise in data generation worldwide, with the development and widespread adoption of several software projects targeting the Big Data paradigm. Many companies currently engage in Big Data analytics as part of their core business activities, nonetheless there are no tools or techniques to support the design of the underlying infrastructure configuration backing such systems. In particular, the focus in this paper is set on Cloud deployed clusters, which represent a cost-effective alternative to on premises installations. We propose a novel tool implementing a battery of optimization and prediction techniques integrated so as to efficiently assess several alternative resource configurations, in order to determine the minimum cost cluster deployment satisfying Quality of Service constraints. Further, the experimental campaign conducted on real systems shows the validity and relevance of the proposed method.

Keywords

MapReduce Optimization Queueing networks 

References

  1. 1.
  2. 2.
    Amazon Elastic MapReduce. https://aws.amazon.com/elasticmapreduce/
  3. 3.
    Amazon Simple Storage Service. https://aws.amazon.com/s3/
  4. 4.
  5. 5.
    The digital universe in 2020. http://idcdocserv.com/1414
  6. 6.
    Hadoop MapReduce next generation — Capacity Scheduler. http://hortonworks.com/blog/benchmarking-apache-hive-13-enterprise-hadoop/
  7. 7.
  8. 8.
  9. 9.
    TPC-DS benchmark. http://www.tpc.org/tpcds/
  10. 10.
    Aleti, A., Buhnova, B., Grunske, L., Koziolek, A., Meedeniya, I.: Software architecture optimization methods: a systematic literature review. IEEE Trans. Softw. Eng. PP(99), 1 (2013)Google Scholar
  11. 11.
    Bardhan, S., Menascé, D.A.: Queuing network models to predict the completion time of the map phase of MapReduce jobs. In: International CMG Conference (2012)Google Scholar
  12. 12.
    Becker, S., Koziolek, H., Reussner, R.: The Palladio component model for model-driven performance prediction. J. Syst. Softw. 82(1), 3–22 (2009)CrossRefGoogle Scholar
  13. 13.
    Bertoli, M., Casale, G., Serazzi, G.: JMT: performance engineering tools for system modeling. SIGMETRICS Perform. Eval. Rev. 36(4), 10–15 (2009)CrossRefGoogle Scholar
  14. 14.
    Brosig, F., Meier, P., Becker, S., Koziolek, A., Koziolek, H., Kounev, S.: Quantitative evaluation of model-driven performance analysis and simulation of component-based architectures. IEEE Trans. Softw. Eng. 41(2), 157–175 (2015)CrossRefGoogle Scholar
  15. 15.
    Castiglione, A., Gribaudo, M., Iacono, M., Palmieri, F.: Exploiting mean field analysis to model performances of big data architectures. Future Gener. Comput. Syst. 37, 203–211 (2014)CrossRefGoogle Scholar
  16. 16.
    Ciavotta, M., Gianniti, E., Ardagna, D.: D-SPACE4Cloud: a design tool for big data applications. Technical report (2016). arXiv:1605.07083
  17. 17.
    Garey, M.R., Johnson, D.S.: Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman & Co., New York (1990)MATHGoogle Scholar
  18. 18.
    Greene, M.A., Sreekanti, K.: Big data in the enterprise: we need an “easy button” for Hadoop (2016)Google Scholar
  19. 19.
    Herodotou, H., Lim, H., Luo, G., Borisov, N., Dong, L., Cetin, F.B., Babu, S.: Starfish: a self-tuning system for big data analytics. In: CIDR (2011)Google Scholar
  20. 20.
    Jagadish, H.V., Gehrke, J., Labrinidis, A., Papakonstantinou, Y., Patel, J.M., Ramakrishnan, R., Shahabi, C.: Big data and its technical challenges. Commun. ACM 57(7), 86–94 (2014)CrossRefGoogle Scholar
  21. 21.
    Jha, S., Qiu, J., Luckow, A., Mantha, P., Fox, G.C.: A tale of two data-intensive paradigms: applications, abstractions, and architectures (2016). http://arxiv.org/abs/1403.1528
  22. 22.
    Kambatla, K., Kollias, G., Kumar, V., Grama, A.: Trends in big data analytics. J. Parallel Distrib. Comput. 74(7), 2561–2573 (2014)CrossRefGoogle Scholar
  23. 23.
    Koziolek, A., Koziolek, H., Reussner, R.: PerOpteryx: automated application of tactics in multi-objective software architecture optimization. In: QoSA 2011 Proceedings, QoSA-ISARCS 2011, pp. 33–42. ACM, New York (2011)Google Scholar
  24. 24.
    Lazowska, E.D., Zahorjan, J., Graham, G.S., Sevcik, K.C.: Quantitative System Performance. Prentice-Hall, Englewood Cliffs (1984)Google Scholar
  25. 25.
    Lee, K.H., Lee, Y.J., Choi, H., Chung, Y.D., Moon, B.: Parallel data processing with MapReduce: a survey. SIGMOD Rec. 40(4), 11–20 (2012)CrossRefGoogle Scholar
  26. 26.
    Luckow, A., Paraskevakos, I., Chantzialexiou, G., Jha, S.: Hadoop on HPC: integrating Hadoop and pilot-based dynamic resource management. In: 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), pp. 1607–1616 (2016)Google Scholar
  27. 27.
    Malekimajd, M., Ardagna, D., Ciavotta, M., Rizzi, A.M., Passacantando, M.: Optimal map reduce job capacity allocation in cloud systems. SIGMETRICS Perform. Eval. Rev. 42(4), 51–61 (2015)CrossRefGoogle Scholar
  28. 28.
    Morton, K., Friesen, A., Balazinska, M., Grossman, D.: Estimating the progress of MapReduce pipelines. In: ICDE (2010)Google Scholar
  29. 29.
    Morton, K., Balazinska, M., Grossman, D.: ParaTimer: a progress indicator for MapReduce DAGs. In: SIGMOD (2010)Google Scholar
  30. 30.
    OMG: PEPA: performance evaluation process algebra (2015). http://www.dcs.ed.ac.uk/pepa/tools/
  31. 31.
    Phan, L.T.X., Zhang, Z., Zheng, Q., Loo, B.T., Lee, I.: An empirical analysis of scheduling techniques for real-time cloud-based data processing. In: SOCA (2011)Google Scholar
  32. 32.
    Polo, J., Carrera, D., Becerra, Y., Torres, J., Ayguadé, E., Steinder, M., Whalley, I.: Performance-driven task co-scheduling for MapReduce environments. In: NOMS (2010)Google Scholar
  33. 33.
    Rao, B.T., Reddy, L.S.S.: Survey on improved scheduling in Hadoop MapReduce in Cloud environments (2012)Google Scholar
  34. 34.
    Tan, J., Wang, Y., Yu, W., Zhang, L.: Non-work-conserving effects in MapReduce: diffusion limit and criticality. In: SIGMETRICS (2014)Google Scholar
  35. 35.
    Tian, F., Chen, K.: Towards optimal resource provisioning for running MapReduce programs in public clouds. In: CLOUD (2011)Google Scholar
  36. 36.
    Tribastone, M., Gilmore, S., Hillston, J.: Scalable differential analysis of process algebra models. IEEE Trans. Softw. Eng. 38(1), 205–219 (2012)CrossRefGoogle Scholar
  37. 37.
    Verma, A., Cherkasova, L., Campbell, R.H.: ARIA: automatic resource inference and allocation for MapReduce environments. In: Proceedings of the Eighth International Conference on Autonomic Computing, June 2011Google Scholar
  38. 38.
    Verma, A., Cherkasova, L., Campbell, R.H.: Profiling and evaluating hardware choices for MapReduce environments: an application-aware approach. Perform. Eval. 79, 328–344 (2014)CrossRefGoogle Scholar
  39. 39.
    Vianna, E., Comarela, G., Pontes, T., Almeida, J.M., Almeida, V.A.F., Wilkinson, K., Kuno, H.A., Dayal, U.: Analytical performance models for MapReduce workloads. Int. J. Parallel Program. 41(4), 495–525 (2013)CrossRefGoogle Scholar
  40. 40.
    Yan, F., Cherkasova, L., Zhang, Z., Smirni, E.: Optimizing power and performance trade-offs of MapReduce job processing with heterogeneous multi-core processors. In: CLOUD (2014)Google Scholar
  41. 41.
    Zhang, W., Rajasekaran, S., Duan, S., Wood, T., Zhu, M.: Minimizing interference and maximizing progress for Hadoop virtual machines. SIGMETRICS Perform. Eval. Rev. 42(4), 62–71 (2015)CrossRefGoogle Scholar
  42. 42.
    Zhang, Z., Cherkasova, L., Loo, B.T.: Exploiting cloud heterogeneity to optimize performance and cost of MapReduce processing. SIGMETRICS Perform. Eval. Rev. 42(4), 38–50 (2015)CrossRefGoogle Scholar
  43. 43.
    Zhang, Z., Cherkasova, L., Verma, A., Loo, B.T.: Automated profiling and resource management of pig programs for meeting service level objectives. In: ICAC (2012)Google Scholar

Copyright information

© Springer International Publishing AG 2016

Authors and Affiliations

  • Michele Ciavotta
    • 1
  • Eugenio Gianniti
    • 1
  • Danilo Ardagna
    • 1
  1. 1.Dip. di Elettronica, Informazione e BioingegneriaPolitecnico di MilanoMilanItaly

Personalised recommendations