Skip to main content

Towards Efficient Power Management in MapReduce: Investigation of CPU-Frequencies Scaling on Power Efficiency in Hadoop

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8907))

Abstract

With increasingly inexpensive cloud storage and increasingly powerful cloud processing, the cloud has rapidly become the environment to store and analyze data. Most of the large-scale data computations in the cloud heavily rely on the MapReduce paradigm and its Hadoop implementation. Nevertheless, this exponential growth in popularity has significantly impacted power consumption in cloud infrastructures. In this paper, we focus on MapReduce and we investigate the impact of dynamically scaling the frequency of compute nodes on the performance and energy consumption of a Hadoop cluster. To this end, a series of experiments are conducted to explore the implications of Dynamic Voltage Frequency scaling (DVFS) settings on power consumption in Hadoop-clusters. By adapting existing DVFS governors (i.e., performance, powersave, ondemand, conservative and userspace) in the Hadoop cluster, we observe significant variation in performance and power consumption of the cluster with different applications when applying these governors: the different DVFS settings are only sub-optimal for different MapReduce applications. Furthermore, our results reveal that the current CPU governors do not exactly reflect their design goal and may even become ineffective to manage the power consumption in Hadoop clusters. This study aims at providing more clear understanding of the interplay between performance and power management in Hadoop cluster and therefore offers useful insight into designing power-aware techniques for Hadoop systems.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Amur, H., Cipar, J., Gupta, V., Ganger, G.R., Kozuch, M.A., Schwan, K.: Robust and flexible power-proportional storage. In: Proceedings of the 1st ACM Symposium on Cloud Computing, SoCC ’10, pp. 217–228. ACM, New York (2010)

    Google Scholar 

  2. Cardosa, M., Singh, A., Pucha, H., Chandra, A.: Exploiting spatio-temporal tradeoffs for energy-aware mapreduce in the cloud. In: Proceedings of the 2011 IEEE 4th International Conference on Cloud Computing, CLOUD ’11, pp. 251–258, Washington, DC, USA (2011)

    Google Scholar 

  3. Chen, Y., Alspaugh, S., Borthakur, D., Katz, R.: Energy efficiency for large-scale mapreduce workloads with significant interactive analysis. In: Proceedings of the 7th ACM European Conference on Computer Systems (EuroSys ’12), Bern, Switzerland (2012)

    Google Scholar 

  4. Chen, Y., Ganapathi, A., Katz, R.H.: To compress or not to compress - compute vs. io tradeoffs for mapreduce energy efficiency. In: Proceedings of the First ACM SIGCOMM Workshop on Green Networking, Green Networking ’10, pp. 23–28. ACM, New York (2010)

    Google Scholar 

  5. Chen, Y., Keys, L., Katz, R.H.: Towards energy efficient mapreduce. Technical Report UCB/EECS-2009-109, EECS Department, University of California, Berkeley, Aug 2009

    Google Scholar 

  6. Dean, J., Ghemawat, S.: MapReduce: Simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)

    Article  Google Scholar 

  7. Freeh, V.W., Lowenthal, D.K.: Using multiple energy gears in mpi programs on a power-scalable cluster. In: Proceedings of the Tenth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP ’05, pp. 164–173 (2005)

    Google Scholar 

  8. Ge, R., Feng, X., Song, S., Chang, H.-C., Li, D., Cameron, K.W.: Powerpack: Energy profiling and analysis of high-performance systems and applications. IEEE Trans. Parallel Distrib. Syst. 21(5), 658–671 (2010)

    Article  Google Scholar 

  9. Goiri, I., Le, K., Nguyen, T.D., Guitart, J., Torres, J., Bianchini, R.: Greenhadoop: Leveraging green energy in data-processing frameworks. In: Proceedings of the 7th ACM European Conference on Computer Systems (EuroSys ’12), Bern, Switzerland (2012)

    Google Scholar 

  10. The Apache Hadoop Project (2014). http://www.hadoop.org

  11. Hamilton, J.: Cost of Power in Large-Scale Data Centers (2008). http://perspectives.mvdirona.com/2008/11/28/CostOfPowerInLargeScaleDataCenters.aspx

  12. He, B., Fang, W., Luo, Q., Govindaraju, N.K., Wang, T.: Mars: A mapreduce framework on graphics processors. In: Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques, pp. 260–269, Toronto, Ontario, Canada (2008)

    Google Scholar 

  13. Hsu, C., Feng, W.: A power-aware run-time system for high-performance computing. In: Proceedings of the 2005 ACM/IEEE Conference on Supercomputing, SC ’05, p. 1. IEEE Computer Society, Washington, DC (2005)

    Google Scholar 

  14. Ibrahim, S., Hai, J., Lu, L., He, B., Antoniu, G., Song, W.: Maestro: Replica-aware map scheduling for mapreduce. In: Proceedings of the 12th IEEE/ACM International Symposium on Cluster. Cloud and Grid Computing (CCGrid 2012), pp. 59–72, Ottawa, Canada (2012)

    Google Scholar 

  15. Ibrahim, S., Jin, H., Lu, L., Qi, L., Wu, S., Shi, X.: Evaluating mapreduce on virtual machines: The hadoop case. In: Jaatun, M.G., Zhao, G., Rong, C. (eds.) Cloud Computing. LNCS, vol. 5931, pp. 519–528. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  16. Ibrahim, S., Jin, H., Lu, L., Wu, S., He, B., Qi, L.: Leen: Locality/fairness-aware key partitioning for mapreduce in the cloud. In: Proceedings of the 2010 IEEE Second International Conference on Cloud Computing Technology and Science (CLOUDCOM’10), pp. 17–24, Indianapolis, USA (2010)

    Google Scholar 

  17. Jégou, Y., Lantéri, S., Leduc, J., Melab, N., Mornet, G., Namyst, R., Primet, P., Quetier, B., Richard, O., Talbi, E.-G., Iréa, T.: Grid’5000: A large scale and highly reconfigurable experimental Grid testbed. Int. J. High Perform. Comput. Appl. 20(4), 481–494 (2006)

    Article  Google Scholar 

  18. Jin, H., Ibrahim, S., Qi, L., Cao, H., Wu, S., Shi, X.: The Mapreduce Programming Model and Implementations: Cloud Computing: Principles and Paradigms, pp. 373–390. Wiley, Hoboken (2011)

    Book  Google Scholar 

  19. Kaushik, R.T., Bhandarkar, M.: Greenhdfs: Towards an energy-conserving, storage-efficient, hybrid hadoop compute cluster. In: Proceedings of the 2010 International Conference on Power Aware Computing and Systems, HotPower’10, pp. 1–9. USENIX Association, Berkeley (2010)

    Google Scholar 

  20. Kim, J., Chou, J., Rotem, D.: Energy proportionality and performance in data parallel computing clusters. In: Bayard Cushing, J., French, J., Bowers, S. (eds.) SSDBM 2011. LNCS, vol. 6809, pp. 414–431. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  21. Kwon, Y.C., Balazinska, M., Howe, B., Rolia, J.: Skew-resistant parallel processing of feature-extracting scientific user-defined functions. In: Proceedings of the 1st ACM Symposium on Cloud Computing, pp. 75–86, Indianapolis, Indiana, USA (2010)

    Google Scholar 

  22. Lang, W., Patel, J.M.: Energy management for mapreduce clusters. Proc. VLDB Endow. 3(1–2), 129–139 (2010)

    Article  Google Scholar 

  23. Leverich, J., Kozyrakis, C.: On the energy (in)efficiency of hadoop clusters. SIGOPS Oper. Syst. Rev. 44(1), 61–65 (2010)

    Article  Google Scholar 

  24. Mhedheb, Y., Jrad, F., Tao, J., Zhao, J., Kołodziej, J., Streit, A.: Load and thermal-aware VM scheduling on the cloud. In: Kołodziej, J., Martino, B., Talia, D., Xiong, K. (eds.) ICA3PP 2013, Part I. LNCS, vol. 8285, pp. 101–114. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  25. Ranger, C., Raghuraman, R., Penmetsa, A., Bradski, G., Kozyrakis, C.: Evaluating mapreduce for multi-core and multiprocessor systems. In: Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture (HPCA-13), pp. 13–24, Phoenix, Arizona, USA (2007)

    Google Scholar 

  26. Redhat: Using CPUfreq Governors (2014). https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Power_Management_Guide/cpufreq_governors.html

  27. Li, S., Abdelzaher, T., Yuan, M.: Tapa: Temperature aware power allocation in data center with map-reduce. In: Proceedings of 2011 International Green Computing Conference and Workshops (IGCC’11), Green Networking ’10, pP. 1–8. IEEE, New York (2011)

    Google Scholar 

  28. Thereska, E., Donnelly, A., Narayanan, D.: Sierra: Practical power-proportionality for data center storage. In: Proceedings of the Sixth Conference on Computer Systems, EuroSys ’11, pp. 169–182. ACM, New York (2011)

    Google Scholar 

  29. Vasić, N., Barisits, M., Salzgeber, V., Kostic, D.: Making cluster applications energy-aware. In: Proceedings of the 1st Workshop on Automated Control for Datacenters and Clouds, ACDC ’09, pp. 37–42. ACM, New York (2009)

    Google Scholar 

  30. Wang, L., Tao, J., Ranjan, R., Marten, H., Streit, A., Chen, J., Chen, D.: G-hadoop: Mapreduce across distributed data centers for data-intensive computing. Future Gener. Comput. Syst. 29(3), 739–750 (2013)

    Article  Google Scholar 

  31. Wang, X., Fu, X., Liu, X., Gu, Z.: Power-aware cpu utilization control for distributed real-time systems. In: Proceedings of the 2009 15th IEEE Symposium on Real-Time and Embedded Technology and Applications, RTAS ’09, pp. 233–242. IEEE Computer Society (2009)

    Google Scholar 

  32. Wirtz, T., Ge, R.: Improving mapreduce energy efficiency for computation intensive workloads. In: Proceedings of 2011 International Green Computing Conference and Workshops (IGCC’11), Green Networking ’10, pp. 1–8. IEEE, New York (2011)

    Google Scholar 

  33. Zaharia, M., Borthakur, D., Sarma, J.S., Elmeleegy, K., Shenker, S., Stoica, I.: Delay scheduling: A simple technique for achieving locality and fairness in cluster scheduling. In: Proceedings of the 5th ACM European Conference on Computer Systems (EuroSys’10), pp. 265–278, Paris, France (2010)

    Google Scholar 

  34. Zhu, N., Rao, L., Liu, X., Liu, J., Guan, H.: Taming power peaks in mapreduce clusters. In: Proceedings of the ACM SIGCOMM 2011 Conference, SIGCOMM ’11, pp. 416–417. ACM, New York (2011)

    Google Scholar 

Download references

Acknowledgments

This work is supported by the ANR MapReduce grant (ANR-10-SEGI-001) and the Héméra INRIA Large Wingspan-Project (see http://www.grid5000.fr/mediawiki/index.php/Hemera).

Experiments presented in this paper were carried out using the Grid’5000 testbed, supported by a scientific interest group hosted by Inria and including CNRS, RENATER and several Universities as well as other organizations (see http://www.grid5000.fr/).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shadi Ibrahim .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Ibrahim, S., Moise, D., Chihoub, HE., Carpen-Amarie, A., Bougé, L., Antoniu, G. (2014). Towards Efficient Power Management in MapReduce: Investigation of CPU-Frequencies Scaling on Power Efficiency in Hadoop . In: Pop, F., Potop-Butucaru, M. (eds) Adaptive Resource Management and Scheduling for Cloud Computing. ARMS-CC 2014. Lecture Notes in Computer Science(), vol 8907. Springer, Cham. https://doi.org/10.1007/978-3-319-13464-2_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-13464-2_11

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-13463-5

  • Online ISBN: 978-3-319-13464-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics