Energy optimization of representative barrier algorithms

Chen, Juan; Dong, Yong

doi:10.1007/s11771-012-1348-z

Energy optimization of representative barrier algorithms

Published: 04 October 2012

Volume 19, pages 2823–2831, (2012)
Cite this article

Journal of Central South University Aims and scope Submit manuscript

Juan Chen (陈娟)¹ &
Yong Dong (董勇)¹

71 Accesses
3 Citations
Explore all metrics

Abstract

Too high energy consumption is widely recognized to be a critical problem in large-scale parallel computing systems. The LogP-based energy-saving model and the frequency scaling method were proposed to reduce energy consumption analytically and systematically for other two representative barrier algorithms: tournament barrier and central counter barrier. Furthermore, energy optimization methods of these two barrier algorithms were implemented on parallel computing platform. The experimental results validate the effectiveness of the energy optimization methods. 67.12% and 70.95% energy savings are obtained respectively for tournament barrier and central counter barrier on platforms with 2048 processes with 1.55%-8.80% performance loss. Furthermore, LogP-based energy-saving analytical model for these two barrier algorithms is highly accurate as the predicted energy savings are within 9.67% of the results obtained by simulation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Malleable Techniques and Resource Scheduling to Improve Energy Efficiency in Parallel Applications

Computation-Aware Dynamic Frequency Scaling: Parsimonious Evaluation of the Time-Energy Trade-Off Using Design of Experiments

Compilers for Low Power with Design Patterns on Embedded Multicore Systems

Article 11 July 2014

References

YELICK K. Ten ways to waste a parallel computer [C]// Proceedings of the 36th Annual International Symposium on Computer Architecture. Austin, TX, USA: ACM, 2009: 1.
Google Scholar
LI D, de SUPINSKI B, SCHULZ M, CAMERON K, NIKOLOPOULOS D S. Hybrid MPI/OpenMP power-aware computing [C]// Proceedings of 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS). Atlanta, GA: IEEE Press, 2010: 1–12.
Google Scholar
PJESIVAC-GRBOVIC J, ANGSKUN T, BOSILCA G, FAGG G E, GABRIEL E, DONGARRA J J. Performance analysis of MPI collective operations [C]// Cluster Computing-07. Hingham, MA, USA: Kluwer Academic Publishers, 2007: 127–143.
Google Scholar
YEW P C, TZENG N F, LAWRIE D H. Distributing hot-spot addressing in large scale multiprocessors [J]. IEEE Transactions on Computers, 1987, C-36(4): 388–395.
Article Google Scholar
HENSGEN D, FINKEL R, MANBER U. Two algorithms for barrier synchronization [J]. Int J Parallel Program, 1988, 17(1): 1–17.
Article MATH Google Scholar
FREUDENTHAL E, GOTTLIEB A. Process coordination with fetch-and-increment [C]// ASPLOS-IV: Proceedings of the Fourth International Conference on Architectural Support for Programming Languages and Operating Systems: ACM Press, 1991: 260–268.
GOODMAN J R, VERNON M K, WOEST P J. Efficient synchronization primitives for large-scale cache-coherent multiprocessors [C]// ACM SIGARCH Computer Architecture News-Special Issue: Proceedings of ASPLOS-III: the Third International Conference on Architecture Support for Programming Languages and Operating Systems, 1989: 64–75.
BROOKS E D. The butterfly barrier [J]. International Journal of Parallel Programming, 1986, 15(4): 295–307.
Article MATH Google Scholar
CULLER D, KARP R, PATTERSON D, SAHAY A, SCHAUSER K E, SANTOS E, SUBRAMONIAN R, Eicken T von. LogP: Towards a realistic model of parallel computation [C]// Proceedings of the 4th ACM SIGPLAN Symposium on Principles and Practice of Parrallel Programming. New York: ACM, 1993: 1–12.
Google Scholar
HOEFLER T, CERQUETTI L, MEHLAN T, MIETKE F, REHM W. A practical approach to the rating of barrier algorithms using the LogP model and open MPI [C]// Proceedings of the 2005 International Conference on Parallel Processing Workshops. Washington DC: IEEE Computer Society, 2005: 562–569.
Google Scholar
Open MPI, Open source high performance computing [EB/OL]. [2012-09-10]. http://www.open-mpi.org/.
NANJEGOWDA R, HERNANDEZ O, CHAPMAN B. Scalability evaluation of barrier algorithms for openMP [C]// Proceedings of the 5th International Workshop on OpenMP: Evolving OpenMP in an Age of Extreme Parallelism. Dresden, Germany. Springer-Verlag. 2009: 42–52.
HOEFLER T, MEHLAN T, MIETKE F, REHM W. Fast barrier synchronization for InfiniBand [C]// Proceedings of the 20th IEEE International Parallel & Distributed Processing Symposium (IPDPS’06), CAC’06 Workshop. Greece. IEEE. 2006: 272–280.
LI Jian, MART J F, HUANG M C. The thrifty barrier: Energy-aware synchronization in shared-memory multiprocessors [C]// Proceedings of International Symposium on High-Performance Computer Architecture. Madrid, Spain: IEEE Computer Society, 2004: 14–23.
Google Scholar
LI Jian, MARTINEZ J F. Power-performance implications of thread-level parallelism on chip multiprocessors [C]// Proceedings of Symposium on Performance Analysis of Systems and Software (ISPASS’05). Austin, TX: IEEE, 2005: 124–134.
Google Scholar
LI Jian, MART J F. Power-performance considerations of parallel computing on chip multiprocessors [J]. ACM Trans Archit Code Optim, 2005, 2(4): 397–422.
Article Google Scholar
GOLUBEV O, LOGH M, PONCINO M. On the energy efficiency of synchronization primitives for shared-memory single-chip multiprocessors [C]// Proceedings of the 17th ACM Great Lakes Symposium on VLSI. Stresa-Lago Maggiore, Italy: ACM. 2007: 489–492.
FERR C, BAHA R I, LOGH M, PONCINO M. Energy-optimal synchronization primitives for single-chip multi-processors [C]// Proceedings of the 19th ACM Great Lakes Symposium on VLSI. Boston Area, MA, USA: ACM. 2009: 141–144.
Chapter Google Scholar
VILL O, PALERM G, SILVANO C. Efficiency and scalability of barrier synchronization on NoC based many-core architectures [C]// Proceedings of the 2008 International Conference on Compilers, Architectures and Synthesis for Embedded Systems. Atlanta, GA, USA: ACM. 2008: 81–90.
Chapter Google Scholar
KANDEMI M, SON S W. Reducing power through compiler-directed barrier synchronization elimination [C]// Proceedings of the 2006 International Symposium on Low Power Electronics and Design. Tegernsee, Bavaria, Germany: ACM. 2006: 354–357.
Google Scholar
KAPPIAH N, FREEH V W, LOWENTHAL D K. Just in time dynamic voltage scaling: Exploiting inter-node slack to save energy in MPI programs [C]// Proceedings of the ACM/IEEE SC2005 Conference on High Performance Networking and Computing (SC’05). Seattle, WA, USA: IEEE Computer Society. 2005: 12–18.
Google Scholar
ROUNTREE B, LOWNENTHAL D K, SUPINSKI B R, SCHULZ M, FREEH V M, BLETSCH T. Adagio: Making DVS practical for complex HPC applications [C]// Proceedings of the 23rd international conference on Supercomputing. Yorktown Heights, NY, USA: ACM. 2009: 460–469.
Chapter Google Scholar
Intel® Xeon® Processor X5670 (12M Cache, 2.93 GHz, 6.40 GT/s Intel® QPI) [EB/OL]. [2012-09-10]. http://ark.intel.com/products/47920/Intel-Xeon-Processor-X5670-12M-Cache-2_93-GHz-6_40-GTs-Intel-QPI#infosectioessentials.
XIE Min. Research and implementation of high-availability MPI parallel programming environment and parallel programming methods [D]. Changsha: School of Computer, National University of Defense Technology, 2007: 19–23. (in Chinese)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer, National University of Defense Technology, Changsha, 410073, China
Juan Chen (陈娟) & Yong Dong (董勇)

Authors

Juan Chen (陈娟)
View author publications
You can also search for this author in PubMed Google Scholar
Yong Dong (董勇)
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Juan Chen (陈娟).

Additional information

Foundation item: Projects(60903044, 61170049) supported by National Natural Science Foundation of China

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chen, J., Dong, Y. Energy optimization of representative barrier algorithms. J. Cent. South Univ. 19, 2823–2831 (2012). https://doi.org/10.1007/s11771-012-1348-z

Download citation

Received: 21 September 2011
Accepted: 08 November 2011
Published: 04 October 2012
Issue Date: October 2012
DOI: https://doi.org/10.1007/s11771-012-1348-z

Key words

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Energy optimization of representative barrier algorithms

Abstract

Access this article

Similar content being viewed by others

Malleable Techniques and Resource Scheduling to Improve Energy Efficiency in Parallel Applications

Computation-Aware Dynamic Frequency Scaling: Parsimonious Evaluation of the Time-Energy Trade-Off Using Design of Experiments

Compilers for Low Power with Design Patterns on Embedded Multicore Systems

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Key words

Navigation

Energy optimization of representative barrier algorithms

Abstract

Access this article

Similar content being viewed by others

Malleable Techniques and Resource Scheduling to Improve Energy Efficiency in Parallel Applications

Computation-Aware Dynamic Frequency Scaling: Parsimonious Evaluation of the Time-Energy Trade-Off Using Design of Experiments

Compilers for Low Power with Design Patterns on Embedded Multicore Systems

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Key words

Search

Navigation