Self-Balancing Job Parallelism and Throughput in Hadoop

  • Bo Zhang
  • Filip KřikavaEmail author
  • Romain Rouvoy
  • Lionel Seinturier
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9687)


In Hadoop cluster, the performance and the resource consumption of MapReduce jobs do not only depend on the characteristics of these applications and workloads, but also on the appropriate setting of Hadoop configuration parameters. However, when the job workloads are not known a priori or they evolve over time, a static configuration may quickly lead to a waste of computing resources and consequently to a performance degradation. In this paper, we therefore propose an on-line approach that dynamically reconfigures Hadoop at runtime. Concretely, we focus on balancing the job parallelism and throughput by adjusting Hadoop capacity scheduler memory configuration. Our evaluation shows that the approach outperforms vanilla Hadoop deployments by up to 40 % and the best statically profiled configurations by up to 13 %.


Completion Time Memory Utilization Feedback Control Loop Hadoop Cluster Workload Dynamic 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



This work is partially supported by the Datalyse project Experiments presented in this paper were carried out using the Grid’5000 testbed, supported by a scientific interest group hosted by Inria and including CNRS, RENATER and several Universities as well as other organizations (see


  1. 1.
    Berekmeri, M., Serrano, D., Bouchenak, S., Marchand, N., Robu, B.: A control approach for performance of big data systems. In: IFAC World Congress (2014)Google Scholar
  2. 2.
    Chen, K., Powers, J., Guo, S., Tian, F.: CRESP: towards optimal resource provisioning for MapReduce computing in public clouds. IEEE Trans. Parallel Distrib. Syst. 25, 1403–1412 (2014)CrossRefGoogle Scholar
  3. 3.
    Chen, Y., Ganapathi, A., Griffith, R., Katz, R.H.: The case for evaluating MapReduce performance using workload suites. In: IEEE/ACM MASCOTS (2011)Google Scholar
  4. 4.
    Ekanayake, J., Li, H., Zhang, B., Gunarathne, T., Bae, S.H., Qiu, J., Fox, G.: Twister: a runtime for iterative MapReduce. In: HPDC (2010)Google Scholar
  5. 5.
    Ghit, B., Yigitbasi, N., Iosup, A., Epema, D.H.J.: Balanced resource allocations across multiple dynamic MapReduce clusters. In: ACM SIGMETRICS (2014)Google Scholar
  6. 6.
    Guo, Y., Rao, J., Zhou, X.: iShuffle: Improving hadoop performance with shuffle-on-write. In: Proceedings of the 10th International Conference on Autonomic Computing (ICAC 2013) (2013)Google Scholar
  7. 7.
    Heintz, B., Chandra, A., Sitaraman, R., Weissman, J.: End-to-end optimization for geo-distributed MapReduce. IEEE Trans. Cloud Comput. PP(99), 1–14 (2014)CrossRefGoogle Scholar
  8. 8.
    Herodotou, H., Babu, S.: Profiling, what-if analysis, and cost-based optimization of MapReduce programs. PVLDB 4(11), 1111–1122 (2011)Google Scholar
  9. 9.
    Herodotou, H., Lim, H., Luo, G., Borisov, N.: Starfish: a self-tuning system for big data analytics. In: Conference on Innovative Data Systems Research (2011)Google Scholar
  10. 10.
    Hong, S., Ravindra, P., Anyanwu, K.: Adaptive information passing for early state pruning in MapReduce data processing workflows. In: Proceedings of the 10th International Conference on Autonomic Computing (ICAC 2013) (2013)Google Scholar
  11. 11.
    Huang, S., Huang, J., Dai, J., Xie, T., Huang, B.: The HiBench benchmark suite: characterization of the MapReduce-based data analysis. In: Proceedings of the 26th International Conference on Data Engineering (ICDE)Google Scholar
  12. 12.
    IBM: An Architectural Blueprint for Autonomic Computing, 4 edition. Technical report, IBM (2006)Google Scholar
  13. 13.
    Jahani, E., Cafarella, M.J., Ré, C.: Automatic optimization for MapReduce programs. Proc. VLDB Endow. 4, 385–396 (2011)CrossRefGoogle Scholar
  14. 14.
    Lama, P., Zhou, X.: AROMA: automated resource allocation and configuration of mapreduce environment in the cloud. In: ICAC (2012)Google Scholar
  15. 15.
    Li, C., Zhuang, H., Lu, K., Sun, M., Zhou, J., Dai, D., Zhou, X.: An Adaptive auto-configuration tool for hadoop. In: ICECCS (2014)Google Scholar
  16. 16.
    Liao, G., Datta, K., Willke, T.L.: Gunther: search-based auto-tuning of MapReduce. In: Wolf, F., Mohr, B., an Mey, D. (eds.) Euro-Par 2013. LNCS, vol. 8097, pp. 406–419. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  17. 17.
    Liu, J., Ravi, N., Chakradhar, S., Kandemir, M.: Panacea: towards holistic optimization of MapReduce applications. In: CGO (2012)Google Scholar
  18. 18.
    Nzekwa, R., Rouvoy, R., Seinturier, L.: A flexible context stabilization approach for self-adaptive application. In: Proceedings of the 8th Annual IEEE International Conference on Pervasive Computing and Communications (PerCom). IEEE (2010)Google Scholar
  19. 19.
    Padala, P., Hou, K., Shin, K.G., Zhu, X., Uysal, M., Wang, Z., Singhal, S., Merchant, A.: Automated control of multiple virtualized resources. In: Proceedings of the 2009 EuroSys (2009)Google Scholar
  20. 20.
    Padala, P., Shin, K.G., Zhu, X., Uysal, M., Wang, Z., Singhal, S., Merchant, A., Salem, K.: Adaptive control of virtualized resources in utility computing environments. In: Proceedings of the 2007 EuroSys (2007)Google Scholar
  21. 21.
    Polo, J., Becerra, Y., Carrera, D., Torres, J., Ayguade, E., Steinder, M.: Adaptive MapReduce scheduling in shared environments. In:14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, pp. 61–70 (2014)Google Scholar
  22. 22.
    Ren, K., Gibson, G., Kwon, Y., Balazinska, M., Howe, B.: Hadoop’s adolescence: a comparative workloads analysis from three research clusters. In: SC Companion on High Performance Computing, Networking Storage and Analysis (2012)Google Scholar
  23. 23.
    Wang, Y., Wang, X., Chen, M., Zhu, X.: Power-efficient response time guarantees for virtualized enterprise servers. In: Real-Time Systems Symposium (2008)Google Scholar
  24. 24.
    Xu, L., Liu, J., Wei, J.: FMEM: a fine-grained memory estimator for MapReduce jobs. In: Proceedings of the 10th International Conference on Autonomic Computing (2013)Google Scholar
  25. 25.
    Zhang, W., Rajasekaran, S., Wood, T., Zhu, M.: MIMP: deadline and interference aware scheduling of hadoop virtual machines. In: IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, May 2014Google Scholar

Copyright information

© IFIP International Federation for Information Processing 2016

Authors and Affiliations

  • Bo Zhang
    • 1
  • Filip Křikava
    • 2
    Email author
  • Romain Rouvoy
    • 1
  • Lionel Seinturier
    • 1
  1. 1.University of Lille/InriaVilleneuve-d’ascqFrance
  2. 2.Czech Technical UniversityPragueCzech Republic

Personalised recommendations