Skip to main content

Advertisement

Log in

Popularity-based covering sets for energy proportionality in shared-nothing clusters

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Energy management for large-scale clusters has been the subject of significant research attention in recent years. The principle of energy proportionality states that we can save energy by activating only a subset of cluster nodes, in proportion to the current load. However, achieving the energy proportionality in shared-nothing clusters is challenging, because the arbitrary deactivation of nodes would make some data become unavailable. In this paper, we propose a new algorithm, named popularity-based covering sets (PCS), to achieve the energy proportionality in large-scale shared-nothing clusters. PCS determines the set of active nodes dynamically, in order to achieve the design goals of (a) guaranteeing the minimum level of availability for every data so that any job can execute promptly, and (b) providing more replicas for popular data to mitigate contention on the data. This differs from previous studies, where some data may become unavailable, or they provide the same number of replicas for every data. Furthermore, PCS is rack-aware and thus it can reduce the energy consumption of power-hungry rack components. Experiment results indicate that PCS improves the overall energy savings by up to 62% compared to previous algorithms without significant performance loss.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20

Similar content being viewed by others

References

  1. Ananthanarayanan G, Agrawal S, Kandula S, Greenberg A, Stoica I, Harlan D, Harris E (2011) Scarlett: coping with skewed content popularity in MapReduce clusters. In: Proceedings 6th ACM European Conference Computer System (EuroSys’11), pp 287–300

  2. Borthakur D et al (2011) Apache Hadoop goes realtime at Facebook. In: Proceedings ACM SIGMOD International Conference Management of Data (SIGMOD’11), pp 1071–1080

  3. Borthakur D (2010) Facebook has the worlds largest Hadoop cluster! http://hadoopblog.blogspot.com/2010/05/facebook-has-worlds-largest-hadoop.html. Accessed 21 Nov 2016

  4. Chen Y et al (2012) Energy efficiency for large-scale MapReduce workloads with significant interactive analysis. In: Proceedings 7th ACM European Conference Computer System (EuroSys’12), pp 43–56

  5. Chen Y, Alspaugh S, Katz R (2012) Interactive analytical processing in big data systems: a cross-industry study of MapReduce workloads. J Proc VLDB Endow 5(12):1802–1813

    Article  Google Scholar 

  6. Chen J, Gong Y, Fiorani M, Aleksic S (2015) Optical interconnects at the top of the rack for energy-efficient data centers. IEEE Commun Mag 53(8):140–148

    Article  Google Scholar 

  7. Cheng D, Lama P, Jiang C, Zhou X (2015) Towards energy efficiency in heterogeneous Hadoop clusters by adaptive task assignment. In: Proceedings International Conference Distributed Computing System (ICDCS’15), pp 359–368

  8. Cho H (2012) Energy management for a real-time shared disk cluster. J Supercomput 62:1338–1361

    Article  Google Scholar 

  9. Conejero J, Rana O, Burnap P, Morgan J, Caminero B, Carrion C (2016) Analyzing Hadoop power consumption and impact on application QoS. Future Gener Comput Syst 55:213–223

    Article  Google Scholar 

  10. Dong D, Herbert J (2015) Record-aware compression for big textual data analysis acceleration. In: Proceedings 2015 IEEE International Conference Big Data (Big Data), pp 1183–1190

  11. Eldawy A, Levandoski J, Larson PÅ (2014) Trekking through Siberia: managing cold data in a memory-optimized database. J Proc VLDB Endow 7(11):931–942

    Article  Google Scholar 

  12. Elmeleegy K, Olston C, Reed B (2014) SpongeFiles: mitigating data skew in MapReduce using distributed memory. In: Proceedings ACM International Conference Management of Data (SIGMOD’14), pp 551–562

  13. Ganesh L, Weatherspoon H, Matian T, Birman K (2013) Integrated approach to data center power management. IEEE Trans Comput 62(6):1086–1096

    Article  MathSciNet  MATH  Google Scholar 

  14. Ibrahim S, Phan T, Carpen-Amarie A, Chihoub H, Moise D, Antoniu G (2016) Governing energy consumption in Hadoop through CPU frequency scaling: an analysis. Future Gener Comput Syst 54:219–232

    Article  Google Scholar 

  15. Kao Y, Chen Y (2016) Data-locality-aware MapReduce real-time scheduling framework. J Syst Sci 112:65–77

    Google Scholar 

  16. Kaushik R, Bhandarkar M, Najrstedt K (2010) Evaluation and analysis of GreenHDFS: a self-adaptive, energy-conserving variant of the Hadoop distributed file system. In: Proceedings 2nd IEEE International Conference Cloud Computing Technology and Science (CloudCom’10), pp 274–287

  17. Karakoyunlu C, Chandy J (2016) Exploiting user metadata for energy-aware node allocation in a cloud storage system. J Comput Syst Sci 82:292–309

    Article  MathSciNet  MATH  Google Scholar 

  18. Kim J, Chou J, Rotem D (2014) iPACS: power-aware covering sets for energy proportionality and performance in data parallel computing clusters. J Parallel Distrib Comput 74:1762–1774

    Article  Google Scholar 

  19. Leverich J, Kozyrakis C (2010) On the energy (in)efficiency of Hadoop clusters. Op Syst Rev 44(1):61–65

    Article  Google Scholar 

  20. Lin Y, Shen H (2017) EAFR: An energy-efficient adaptive file replication system in data-intensive clusters. IEEE Trans Parallel Distrib Syst 28(4):1017–1030

    Article  Google Scholar 

  21. Liu G, Shen H, Wang H (2017) Towards long-view computing load balancing in cluster storage systems. IEEE Trans Parallel Distrib Syst 28(6):1770–1784

    Article  Google Scholar 

  22. Luo X, Xin G, Wang Y, Zhang Z, Wang H (2015) Superset: a non-uniform replica placement strategy towards perfect load balance and fine-grained power proportionality. Cluster Comput 18(3):1127–1140

    Article  Google Scholar 

  23. Mashayekhy L, Nejad M, Grosu D, Zhang Q, Shi W (2015) Energy-aware scheduling of MapReduce jobs for big data applications. IEEE Trans Parallel Distrib Syst 26(10):2720–2733

    Article  Google Scholar 

  24. Mesquite Software, Inc. (2009) User’s guide of CSIM20 simulation engine. http://www.mesquite.com/documentation. Accessed 21 Nov 2016

  25. Orgerie A, Assuncao M, Lefevre L (2014) A survey on techniques for improving energy efficiency of large-scale distributed systems. ACM Comput Surv 46(4):47

    Article  Google Scholar 

  26. Patil V, Chaudhary V (2013) Rack aware scheduling in HPC data centers: an energy conservation strategy. Cluster Comput 16:559–573

    Article  Google Scholar 

  27. Ren K, Kwon Y, Balazinska M, Howe B (2013) Hadoop’s adolescence—an analysis of Hadoop usage in scientific workloads. J Proc VLDB Endow 6(10):129–139

    Google Scholar 

  28. Thereska E, Donnelly A, Narayanan D (2011) Sierra: practical power-proportionality for data center storage. In: Proceedings 6th ACM European Conference Computing System (EuroSys’11), pp 169–182

  29. Tiwari N, Sarkar S, Bellur U, Indrawan M (2015) Classification framework of MapReduce scheduling algorithms. ACM Comput Surv 47(3):49

    Article  Google Scholar 

  30. Wang W, Zhu K, Ying L, Tan J, Zhang L (2016) Map task scheduling in MapReduce with data locality: throughput and heavy-traffic optimality. IEEE/ACM Trans Netw 24(1):190–203

    Article  Google Scholar 

  31. White J (2015) Hadoop–the definitive guide, 4th edn. O’Reilly, Sebastopol

    Google Scholar 

  32. Yang T, Pen H, Li W, Yuan D, Zomaya A (2017) An energy-efficient storage strategy for cloud data centers based on variable k-coverage of a hypergraph. IEEE Trans Parallel Distrib Syst 28(12):3344–3355

  33. Zaharia M, Borthakur D, Sarma JS, Elmeleegy K, Shenker S, Stoica I (2010) Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling. In: Proceedings 5th ACM European Conf Computing System (EuroSys’10), pp 265–278

Download references

Acknowledgements

The author would like to thank the anonymous reviewers for their helpful comments. This work was supported by the Yeungnam University Research Grant.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Haengrae Cho.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kim, M., Cho, H. Popularity-based covering sets for energy proportionality in shared-nothing clusters. J Supercomput 74, 1885–1910 (2018). https://doi.org/10.1007/s11227-017-2197-1

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-017-2197-1

Keywords

Navigation