Skip to main content
Log in

Energy optimization of representative barrier algorithms

  • Published:
Journal of Central South University Aims and scope Submit manuscript

Abstract

Too high energy consumption is widely recognized to be a critical problem in large-scale parallel computing systems. The LogP-based energy-saving model and the frequency scaling method were proposed to reduce energy consumption analytically and systematically for other two representative barrier algorithms: tournament barrier and central counter barrier. Furthermore, energy optimization methods of these two barrier algorithms were implemented on parallel computing platform. The experimental results validate the effectiveness of the energy optimization methods. 67.12% and 70.95% energy savings are obtained respectively for tournament barrier and central counter barrier on platforms with 2048 processes with 1.55%-8.80% performance loss. Furthermore, LogP-based energy-saving analytical model for these two barrier algorithms is highly accurate as the predicted energy savings are within 9.67% of the results obtained by simulation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. YELICK K. Ten ways to waste a parallel computer [C]// Proceedings of the 36th Annual International Symposium on Computer Architecture. Austin, TX, USA: ACM, 2009: 1.

    Google Scholar 

  2. LI D, de SUPINSKI B, SCHULZ M, CAMERON K, NIKOLOPOULOS D S. Hybrid MPI/OpenMP power-aware computing [C]// Proceedings of 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS). Atlanta, GA: IEEE Press, 2010: 1–12.

    Google Scholar 

  3. PJESIVAC-GRBOVIC J, ANGSKUN T, BOSILCA G, FAGG G E, GABRIEL E, DONGARRA J J. Performance analysis of MPI collective operations [C]// Cluster Computing-07. Hingham, MA, USA: Kluwer Academic Publishers, 2007: 127–143.

    Google Scholar 

  4. YEW P C, TZENG N F, LAWRIE D H. Distributing hot-spot addressing in large scale multiprocessors [J]. IEEE Transactions on Computers, 1987, C-36(4): 388–395.

    Article  Google Scholar 

  5. HENSGEN D, FINKEL R, MANBER U. Two algorithms for barrier synchronization [J]. Int J Parallel Program, 1988, 17(1): 1–17.

    Article  MATH  Google Scholar 

  6. FREUDENTHAL E, GOTTLIEB A. Process coordination with fetch-and-increment [C]// ASPLOS-IV: Proceedings of the Fourth International Conference on Architectural Support for Programming Languages and Operating Systems: ACM Press, 1991: 260–268.

  7. GOODMAN J R, VERNON M K, WOEST P J. Efficient synchronization primitives for large-scale cache-coherent multiprocessors [C]// ACM SIGARCH Computer Architecture News-Special Issue: Proceedings of ASPLOS-III: the Third International Conference on Architecture Support for Programming Languages and Operating Systems, 1989: 64–75.

  8. BROOKS E D. The butterfly barrier [J]. International Journal of Parallel Programming, 1986, 15(4): 295–307.

    Article  MATH  Google Scholar 

  9. CULLER D, KARP R, PATTERSON D, SAHAY A, SCHAUSER K E, SANTOS E, SUBRAMONIAN R, Eicken T von. LogP: Towards a realistic model of parallel computation [C]// Proceedings of the 4th ACM SIGPLAN Symposium on Principles and Practice of Parrallel Programming. New York: ACM, 1993: 1–12.

    Google Scholar 

  10. HOEFLER T, CERQUETTI L, MEHLAN T, MIETKE F, REHM W. A practical approach to the rating of barrier algorithms using the LogP model and open MPI [C]// Proceedings of the 2005 International Conference on Parallel Processing Workshops. Washington DC: IEEE Computer Society, 2005: 562–569.

    Google Scholar 

  11. Open MPI, Open source high performance computing [EB/OL]. [2012-09-10]. http://www.open-mpi.org/.

  12. NANJEGOWDA R, HERNANDEZ O, CHAPMAN B. Scalability evaluation of barrier algorithms for openMP [C]// Proceedings of the 5th International Workshop on OpenMP: Evolving OpenMP in an Age of Extreme Parallelism. Dresden, Germany. Springer-Verlag. 2009: 42–52.

  13. HOEFLER T, MEHLAN T, MIETKE F, REHM W. Fast barrier synchronization for InfiniBand [C]// Proceedings of the 20th IEEE International Parallel & Distributed Processing Symposium (IPDPS’06), CAC’06 Workshop. Greece. IEEE. 2006: 272–280.

  14. LI Jian, MART J F, HUANG M C. The thrifty barrier: Energy-aware synchronization in shared-memory multiprocessors [C]// Proceedings of International Symposium on High-Performance Computer Architecture. Madrid, Spain: IEEE Computer Society, 2004: 14–23.

    Google Scholar 

  15. LI Jian, MARTINEZ J F. Power-performance implications of thread-level parallelism on chip multiprocessors [C]// Proceedings of Symposium on Performance Analysis of Systems and Software (ISPASS’05). Austin, TX: IEEE, 2005: 124–134.

    Google Scholar 

  16. LI Jian, MART J F. Power-performance considerations of parallel computing on chip multiprocessors [J]. ACM Trans Archit Code Optim, 2005, 2(4): 397–422.

    Article  Google Scholar 

  17. GOLUBEV O, LOGH M, PONCINO M. On the energy efficiency of synchronization primitives for shared-memory single-chip multiprocessors [C]// Proceedings of the 17th ACM Great Lakes Symposium on VLSI. Stresa-Lago Maggiore, Italy: ACM. 2007: 489–492.

  18. FERR C, BAHA R I, LOGH M, PONCINO M. Energy-optimal synchronization primitives for single-chip multi-processors [C]// Proceedings of the 19th ACM Great Lakes Symposium on VLSI. Boston Area, MA, USA: ACM. 2009: 141–144.

    Chapter  Google Scholar 

  19. VILL O, PALERM G, SILVANO C. Efficiency and scalability of barrier synchronization on NoC based many-core architectures [C]// Proceedings of the 2008 International Conference on Compilers, Architectures and Synthesis for Embedded Systems. Atlanta, GA, USA: ACM. 2008: 81–90.

    Chapter  Google Scholar 

  20. KANDEMI M, SON S W. Reducing power through compiler-directed barrier synchronization elimination [C]// Proceedings of the 2006 International Symposium on Low Power Electronics and Design. Tegernsee, Bavaria, Germany: ACM. 2006: 354–357.

    Google Scholar 

  21. KAPPIAH N, FREEH V W, LOWENTHAL D K. Just in time dynamic voltage scaling: Exploiting inter-node slack to save energy in MPI programs [C]// Proceedings of the ACM/IEEE SC2005 Conference on High Performance Networking and Computing (SC’05). Seattle, WA, USA: IEEE Computer Society. 2005: 12–18.

    Google Scholar 

  22. ROUNTREE B, LOWNENTHAL D K, SUPINSKI B R, SCHULZ M, FREEH V M, BLETSCH T. Adagio: Making DVS practical for complex HPC applications [C]// Proceedings of the 23rd international conference on Supercomputing. Yorktown Heights, NY, USA: ACM. 2009: 460–469.

    Chapter  Google Scholar 

  23. Intel® Xeon® Processor X5670 (12M Cache, 2.93 GHz, 6.40 GT/s Intel® QPI) [EB/OL]. [2012-09-10]. http://ark.intel.com/products/47920/Intel-Xeon-Processor-X5670-12M-Cache-2_93-GHz-6_40-GTs-Intel-QPI#infosectioessentials.

  24. XIE Min. Research and implementation of high-availability MPI parallel programming environment and parallel programming methods [D]. Changsha: School of Computer, National University of Defense Technology, 2007: 19–23. (in Chinese)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Juan Chen  (陈娟).

Additional information

Foundation item: Projects(60903044, 61170049) supported by National Natural Science Foundation of China

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chen, J., Dong, Y. Energy optimization of representative barrier algorithms. J. Cent. South Univ. 19, 2823–2831 (2012). https://doi.org/10.1007/s11771-012-1348-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11771-012-1348-z

Key words

Navigation