Advertisement

Towards Efficient Power Management in MapReduce: Investigation of CPU-Frequencies Scaling on Power Efficiency in Hadoop

  • Shadi IbrahimEmail author
  • Diana Moise
  • Houssem-Eddine Chihoub
  • Alexandra Carpen-Amarie
  • Luc Bougé
  • Gabriel Antoniu
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8907)

Abstract

With increasingly inexpensive cloud storage and increasingly powerful cloud processing, the cloud has rapidly become the environment to store and analyze data. Most of the large-scale data computations in the cloud heavily rely on the MapReduce paradigm and its Hadoop implementation. Nevertheless, this exponential growth in popularity has significantly impacted power consumption in cloud infrastructures. In this paper, we focus on MapReduce and we investigate the impact of dynamically scaling the frequency of compute nodes on the performance and energy consumption of a Hadoop cluster. To this end, a series of experiments are conducted to explore the implications of Dynamic Voltage Frequency scaling (DVFS) settings on power consumption in Hadoop-clusters. By adapting existing DVFS governors (i.e., performance, powersave, ondemand, conservative and userspace) in the Hadoop cluster, we observe significant variation in performance and power consumption of the cluster with different applications when applying these governors: the different DVFS settings are only sub-optimal for different MapReduce applications. Furthermore, our results reveal that the current CPU governors do not exactly reflect their design goal and may even become ineffective to manage the power consumption in Hadoop clusters. This study aims at providing more clear understanding of the interplay between performance and power management in Hadoop cluster and therefore offers useful insight into designing power-aware techniques for Hadoop systems.

Keywords

MapReduce Hadoop Power management DVFS  Governors 

Notes

Acknowledgments

This work is supported by the ANR MapReduce grant (ANR-10-SEGI-001) and the Héméra INRIA Large Wingspan-Project (see http://www.grid5000.fr/mediawiki/index.php/Hemera).

Experiments presented in this paper were carried out using the Grid’5000 testbed, supported by a scientific interest group hosted by Inria and including CNRS, RENATER and several Universities as well as other organizations (see http://www.grid5000.fr/).

References

  1. 1.
    Amur, H., Cipar, J., Gupta, V., Ganger, G.R., Kozuch, M.A., Schwan, K.: Robust and flexible power-proportional storage. In: Proceedings of the 1st ACM Symposium on Cloud Computing, SoCC ’10, pp. 217–228. ACM, New York (2010)Google Scholar
  2. 2.
    Cardosa, M., Singh, A., Pucha, H., Chandra, A.: Exploiting spatio-temporal tradeoffs for energy-aware mapreduce in the cloud. In: Proceedings of the 2011 IEEE 4th International Conference on Cloud Computing, CLOUD ’11, pp. 251–258, Washington, DC, USA (2011)Google Scholar
  3. 3.
    Chen, Y., Alspaugh, S., Borthakur, D., Katz, R.: Energy efficiency for large-scale mapreduce workloads with significant interactive analysis. In: Proceedings of the 7th ACM European Conference on Computer Systems (EuroSys ’12), Bern, Switzerland (2012)Google Scholar
  4. 4.
    Chen, Y., Ganapathi, A., Katz, R.H.: To compress or not to compress - compute vs. io tradeoffs for mapreduce energy efficiency. In: Proceedings of the First ACM SIGCOMM Workshop on Green Networking, Green Networking ’10, pp. 23–28. ACM, New York (2010)Google Scholar
  5. 5.
    Chen, Y., Keys, L., Katz, R.H.: Towards energy efficient mapreduce. Technical Report UCB/EECS-2009-109, EECS Department, University of California, Berkeley, Aug 2009Google Scholar
  6. 6.
    Dean, J., Ghemawat, S.: MapReduce: Simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)CrossRefGoogle Scholar
  7. 7.
    Freeh, V.W., Lowenthal, D.K.: Using multiple energy gears in mpi programs on a power-scalable cluster. In: Proceedings of the Tenth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP ’05, pp. 164–173 (2005)Google Scholar
  8. 8.
    Ge, R., Feng, X., Song, S., Chang, H.-C., Li, D., Cameron, K.W.: Powerpack: Energy profiling and analysis of high-performance systems and applications. IEEE Trans. Parallel Distrib. Syst. 21(5), 658–671 (2010)CrossRefGoogle Scholar
  9. 9.
    Goiri, I., Le, K., Nguyen, T.D., Guitart, J., Torres, J., Bianchini, R.: Greenhadoop: Leveraging green energy in data-processing frameworks. In: Proceedings of the 7th ACM European Conference on Computer Systems (EuroSys ’12), Bern, Switzerland (2012)Google Scholar
  10. 10.
    The Apache Hadoop Project (2014). http://www.hadoop.org
  11. 11.
    Hamilton, J.: Cost of Power in Large-Scale Data Centers (2008). http://perspectives.mvdirona.com/2008/11/28/CostOfPowerInLargeScaleDataCenters.aspx
  12. 12.
    He, B., Fang, W., Luo, Q., Govindaraju, N.K., Wang, T.: Mars: A mapreduce framework on graphics processors. In: Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques, pp. 260–269, Toronto, Ontario, Canada (2008)Google Scholar
  13. 13.
    Hsu, C., Feng, W.: A power-aware run-time system for high-performance computing. In: Proceedings of the 2005 ACM/IEEE Conference on Supercomputing, SC ’05, p. 1. IEEE Computer Society, Washington, DC (2005)Google Scholar
  14. 14.
    Ibrahim, S., Hai, J., Lu, L., He, B., Antoniu, G., Song, W.: Maestro: Replica-aware map scheduling for mapreduce. In: Proceedings of the 12th IEEE/ACM International Symposium on Cluster. Cloud and Grid Computing (CCGrid 2012), pp. 59–72, Ottawa, Canada (2012)Google Scholar
  15. 15.
    Ibrahim, S., Jin, H., Lu, L., Qi, L., Wu, S., Shi, X.: Evaluating mapreduce on virtual machines: The hadoop case. In: Jaatun, M.G., Zhao, G., Rong, C. (eds.) Cloud Computing. LNCS, vol. 5931, pp. 519–528. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  16. 16.
    Ibrahim, S., Jin, H., Lu, L., Wu, S., He, B., Qi, L.: Leen: Locality/fairness-aware key partitioning for mapreduce in the cloud. In: Proceedings of the 2010 IEEE Second International Conference on Cloud Computing Technology and Science (CLOUDCOM’10), pp. 17–24, Indianapolis, USA (2010)Google Scholar
  17. 17.
    Jégou, Y., Lantéri, S., Leduc, J., Melab, N., Mornet, G., Namyst, R., Primet, P., Quetier, B., Richard, O., Talbi, E.-G., Iréa, T.: Grid’5000: A large scale and highly reconfigurable experimental Grid testbed. Int. J. High Perform. Comput. Appl. 20(4), 481–494 (2006)CrossRefGoogle Scholar
  18. 18.
    Jin, H., Ibrahim, S., Qi, L., Cao, H., Wu, S., Shi, X.: The Mapreduce Programming Model and Implementations: Cloud Computing: Principles and Paradigms, pp. 373–390. Wiley, Hoboken (2011)CrossRefGoogle Scholar
  19. 19.
    Kaushik, R.T., Bhandarkar, M.: Greenhdfs: Towards an energy-conserving, storage-efficient, hybrid hadoop compute cluster. In: Proceedings of the 2010 International Conference on Power Aware Computing and Systems, HotPower’10, pp. 1–9. USENIX Association, Berkeley (2010)Google Scholar
  20. 20.
    Kim, J., Chou, J., Rotem, D.: Energy proportionality and performance in data parallel computing clusters. In: Bayard Cushing, J., French, J., Bowers, S. (eds.) SSDBM 2011. LNCS, vol. 6809, pp. 414–431. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  21. 21.
    Kwon, Y.C., Balazinska, M., Howe, B., Rolia, J.: Skew-resistant parallel processing of feature-extracting scientific user-defined functions. In: Proceedings of the 1st ACM Symposium on Cloud Computing, pp. 75–86, Indianapolis, Indiana, USA (2010)Google Scholar
  22. 22.
    Lang, W., Patel, J.M.: Energy management for mapreduce clusters. Proc. VLDB Endow. 3(1–2), 129–139 (2010)CrossRefGoogle Scholar
  23. 23.
    Leverich, J., Kozyrakis, C.: On the energy (in)efficiency of hadoop clusters. SIGOPS Oper. Syst. Rev. 44(1), 61–65 (2010)CrossRefGoogle Scholar
  24. 24.
    Mhedheb, Y., Jrad, F., Tao, J., Zhao, J., Kołodziej, J., Streit, A.: Load and thermal-aware VM scheduling on the cloud. In: Kołodziej, J., Martino, B., Talia, D., Xiong, K. (eds.) ICA3PP 2013, Part I. LNCS, vol. 8285, pp. 101–114. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  25. 25.
    Ranger, C., Raghuraman, R., Penmetsa, A., Bradski, G., Kozyrakis, C.: Evaluating mapreduce for multi-core and multiprocessor systems. In: Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture (HPCA-13), pp. 13–24, Phoenix, Arizona, USA (2007)Google Scholar
  26. 26.
  27. 27.
    Li, S., Abdelzaher, T., Yuan, M.: Tapa: Temperature aware power allocation in data center with map-reduce. In: Proceedings of 2011 International Green Computing Conference and Workshops (IGCC’11), Green Networking ’10, pP. 1–8. IEEE, New York (2011)Google Scholar
  28. 28.
    Thereska, E., Donnelly, A., Narayanan, D.: Sierra: Practical power-proportionality for data center storage. In: Proceedings of the Sixth Conference on Computer Systems, EuroSys ’11, pp. 169–182. ACM, New York (2011)Google Scholar
  29. 29.
    Vasić, N., Barisits, M., Salzgeber, V., Kostic, D.: Making cluster applications energy-aware. In: Proceedings of the 1st Workshop on Automated Control for Datacenters and Clouds, ACDC ’09, pp. 37–42. ACM, New York (2009)Google Scholar
  30. 30.
    Wang, L., Tao, J., Ranjan, R., Marten, H., Streit, A., Chen, J., Chen, D.: G-hadoop: Mapreduce across distributed data centers for data-intensive computing. Future Gener. Comput. Syst. 29(3), 739–750 (2013)CrossRefGoogle Scholar
  31. 31.
    Wang, X., Fu, X., Liu, X., Gu, Z.: Power-aware cpu utilization control for distributed real-time systems. In: Proceedings of the 2009 15th IEEE Symposium on Real-Time and Embedded Technology and Applications, RTAS ’09, pp. 233–242. IEEE Computer Society (2009)Google Scholar
  32. 32.
    Wirtz, T., Ge, R.: Improving mapreduce energy efficiency for computation intensive workloads. In: Proceedings of 2011 International Green Computing Conference and Workshops (IGCC’11), Green Networking ’10, pp. 1–8. IEEE, New York (2011)Google Scholar
  33. 33.
    Zaharia, M., Borthakur, D., Sarma, J.S., Elmeleegy, K., Shenker, S., Stoica, I.: Delay scheduling: A simple technique for achieving locality and fairness in cluster scheduling. In: Proceedings of the 5th ACM European Conference on Computer Systems (EuroSys’10), pp. 265–278, Paris, France (2010)Google Scholar
  34. 34.
    Zhu, N., Rao, L., Liu, X., Liu, J., Guan, H.: Taming power peaks in mapreduce clusters. In: Proceedings of the ACM SIGCOMM 2011 Conference, SIGCOMM ’11, pp. 416–417. ACM, New York (2011)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Shadi Ibrahim
    • 1
    Email author
  • Diana Moise
    • 2
  • Houssem-Eddine Chihoub
    • 3
  • Alexandra Carpen-Amarie
    • 4
  • Luc Bougé
    • 5
  • Gabriel Antoniu
    • 1
  1. 1.Inria, Rennes Bretagne Atlantique Research CenterRennesFrance
  2. 2.InIT Cloud Computing LabZHAW WinterthurWinterthurSwitzerland
  3. 3.Inria, Sophia Antipolis Research CenterSophia AntipolisFrance
  4. 4.Vienna University of TechnologyViennaAustria
  5. 5.ENS Rennes/IRISARennesFrance

Personalised recommendations