Popularity-based covering sets for energy proportionality in shared-nothing clusters

Kim, Minki; Cho, Haengrae

doi:10.1007/s11227-017-2197-1

Popularity-based covering sets for energy proportionality in shared-nothing clusters

Published: 25 November 2017

Volume 74, pages 1885–1910, (2018)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

307 Accesses
1 Citation
Explore all metrics

Abstract

Energy management for large-scale clusters has been the subject of significant research attention in recent years. The principle of energy proportionality states that we can save energy by activating only a subset of cluster nodes, in proportion to the current load. However, achieving the energy proportionality in shared-nothing clusters is challenging, because the arbitrary deactivation of nodes would make some data become unavailable. In this paper, we propose a new algorithm, named popularity-based covering sets (PCS), to achieve the energy proportionality in large-scale shared-nothing clusters. PCS determines the set of active nodes dynamically, in order to achieve the design goals of (a) guaranteeing the minimum level of availability for every data so that any job can execute promptly, and (b) providing more replicas for popular data to mitigate contention on the data. This differs from previous studies, where some data may become unavailable, or they provide the same number of replicas for every data. Furthermore, PCS is rack-aware and thus it can reduce the energy consumption of power-hungry rack components. Experiment results indicate that PCS improves the overall energy savings by up to 62% compared to previous algorithms without significant performance loss.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Energy efficiency in cloud computing data centers: a survey on software technologies

Article 30 August 2022

A survey of Kubernetes scheduling algorithms

Article Open access 13 June 2023

Containerization technologies: taxonomies, applications and challenges

Article 08 June 2021

References

Ananthanarayanan G, Agrawal S, Kandula S, Greenberg A, Stoica I, Harlan D, Harris E (2011) Scarlett: coping with skewed content popularity in MapReduce clusters. In: Proceedings 6th ACM European Conference Computer System (EuroSys’11), pp 287–300
Borthakur D et al (2011) Apache Hadoop goes realtime at Facebook. In: Proceedings ACM SIGMOD International Conference Management of Data (SIGMOD’11), pp 1071–1080
Borthakur D (2010) Facebook has the worlds largest Hadoop cluster! http://hadoopblog.blogspot.com/2010/05/facebook-has-worlds-largest-hadoop.html. Accessed 21 Nov 2016
Chen Y et al (2012) Energy efficiency for large-scale MapReduce workloads with significant interactive analysis. In: Proceedings 7th ACM European Conference Computer System (EuroSys’12), pp 43–56
Chen Y, Alspaugh S, Katz R (2012) Interactive analytical processing in big data systems: a cross-industry study of MapReduce workloads. J Proc VLDB Endow 5(12):1802–1813
Article Google Scholar
Chen J, Gong Y, Fiorani M, Aleksic S (2015) Optical interconnects at the top of the rack for energy-efficient data centers. IEEE Commun Mag 53(8):140–148
Article Google Scholar
Cheng D, Lama P, Jiang C, Zhou X (2015) Towards energy efficiency in heterogeneous Hadoop clusters by adaptive task assignment. In: Proceedings International Conference Distributed Computing System (ICDCS’15), pp 359–368
Cho H (2012) Energy management for a real-time shared disk cluster. J Supercomput 62:1338–1361
Article Google Scholar
Conejero J, Rana O, Burnap P, Morgan J, Caminero B, Carrion C (2016) Analyzing Hadoop power consumption and impact on application QoS. Future Gener Comput Syst 55:213–223
Article Google Scholar
Dong D, Herbert J (2015) Record-aware compression for big textual data analysis acceleration. In: Proceedings 2015 IEEE International Conference Big Data (Big Data), pp 1183–1190
Eldawy A, Levandoski J, Larson PÅ (2014) Trekking through Siberia: managing cold data in a memory-optimized database. J Proc VLDB Endow 7(11):931–942
Article Google Scholar
Elmeleegy K, Olston C, Reed B (2014) SpongeFiles: mitigating data skew in MapReduce using distributed memory. In: Proceedings ACM International Conference Management of Data (SIGMOD’14), pp 551–562
Ganesh L, Weatherspoon H, Matian T, Birman K (2013) Integrated approach to data center power management. IEEE Trans Comput 62(6):1086–1096
Article MathSciNet MATH Google Scholar
Ibrahim S, Phan T, Carpen-Amarie A, Chihoub H, Moise D, Antoniu G (2016) Governing energy consumption in Hadoop through CPU frequency scaling: an analysis. Future Gener Comput Syst 54:219–232
Article Google Scholar
Kao Y, Chen Y (2016) Data-locality-aware MapReduce real-time scheduling framework. J Syst Sci 112:65–77
Google Scholar
Kaushik R, Bhandarkar M, Najrstedt K (2010) Evaluation and analysis of GreenHDFS: a self-adaptive, energy-conserving variant of the Hadoop distributed file system. In: Proceedings 2nd IEEE International Conference Cloud Computing Technology and Science (CloudCom’10), pp 274–287
Karakoyunlu C, Chandy J (2016) Exploiting user metadata for energy-aware node allocation in a cloud storage system. J Comput Syst Sci 82:292–309
Article MathSciNet MATH Google Scholar
Kim J, Chou J, Rotem D (2014) iPACS: power-aware covering sets for energy proportionality and performance in data parallel computing clusters. J Parallel Distrib Comput 74:1762–1774
Article Google Scholar
Leverich J, Kozyrakis C (2010) On the energy (in)efficiency of Hadoop clusters. Op Syst Rev 44(1):61–65
Article Google Scholar
Lin Y, Shen H (2017) EAFR: An energy-efficient adaptive file replication system in data-intensive clusters. IEEE Trans Parallel Distrib Syst 28(4):1017–1030
Article Google Scholar
Liu G, Shen H, Wang H (2017) Towards long-view computing load balancing in cluster storage systems. IEEE Trans Parallel Distrib Syst 28(6):1770–1784
Article Google Scholar
Luo X, Xin G, Wang Y, Zhang Z, Wang H (2015) Superset: a non-uniform replica placement strategy towards perfect load balance and fine-grained power proportionality. Cluster Comput 18(3):1127–1140
Article Google Scholar
Mashayekhy L, Nejad M, Grosu D, Zhang Q, Shi W (2015) Energy-aware scheduling of MapReduce jobs for big data applications. IEEE Trans Parallel Distrib Syst 26(10):2720–2733
Article Google Scholar
Mesquite Software, Inc. (2009) User’s guide of CSIM20 simulation engine. http://www.mesquite.com/documentation. Accessed 21 Nov 2016
Orgerie A, Assuncao M, Lefevre L (2014) A survey on techniques for improving energy efficiency of large-scale distributed systems. ACM Comput Surv 46(4):47
Article Google Scholar
Patil V, Chaudhary V (2013) Rack aware scheduling in HPC data centers: an energy conservation strategy. Cluster Comput 16:559–573
Article Google Scholar
Ren K, Kwon Y, Balazinska M, Howe B (2013) Hadoop’s adolescence—an analysis of Hadoop usage in scientific workloads. J Proc VLDB Endow 6(10):129–139
Google Scholar
Thereska E, Donnelly A, Narayanan D (2011) Sierra: practical power-proportionality for data center storage. In: Proceedings 6th ACM European Conference Computing System (EuroSys’11), pp 169–182
Tiwari N, Sarkar S, Bellur U, Indrawan M (2015) Classification framework of MapReduce scheduling algorithms. ACM Comput Surv 47(3):49
Article Google Scholar
Wang W, Zhu K, Ying L, Tan J, Zhang L (2016) Map task scheduling in MapReduce with data locality: throughput and heavy-traffic optimality. IEEE/ACM Trans Netw 24(1):190–203
Article Google Scholar
White J (2015) Hadoop–the definitive guide, 4th edn. O’Reilly, Sebastopol
Google Scholar
Yang T, Pen H, Li W, Yuan D, Zomaya A (2017) An energy-efficient storage strategy for cloud data centers based on variable k-coverage of a hypergraph. IEEE Trans Parallel Distrib Syst 28(12):3344–3355
Zaharia M, Borthakur D, Sarma JS, Elmeleegy K, Shenker S, Stoica I (2010) Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling. In: Proceedings 5th ACM European Conf Computing System (EuroSys’10), pp 265–278

Download references

Acknowledgements

The author would like to thank the anonymous reviewers for their helpful comments. This work was supported by the Yeungnam University Research Grant.

Author information

Authors and Affiliations

Department of Computer Engineering, Yeungnam University, Gyeongsan, Republic of Korea
Minki Kim & Haengrae Cho

Authors

Minki Kim
View author publications
You can also search for this author in PubMed Google Scholar
Haengrae Cho
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Haengrae Cho.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kim, M., Cho, H. Popularity-based covering sets for energy proportionality in shared-nothing clusters. J Supercomput 74, 1885–1910 (2018). https://doi.org/10.1007/s11227-017-2197-1

Download citation

Published: 25 November 2017
Issue Date: May 2018
DOI: https://doi.org/10.1007/s11227-017-2197-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Popularity-based covering sets for energy proportionality in shared-nothing clusters

Abstract

Access this article

Similar content being viewed by others

Energy efficiency in cloud computing data centers: a survey on software technologies

A survey of Kubernetes scheduling algorithms

Containerization technologies: taxonomies, applications and challenges

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Popularity-based covering sets for energy proportionality in shared-nothing clusters

Abstract

Access this article

Similar content being viewed by others

Energy efficiency in cloud computing data centers: a survey on software technologies

A survey of Kubernetes scheduling algorithms

Containerization technologies: taxonomies, applications and challenges

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation