Abstract
Too high energy consumption is widely recognized to be a critical problem in large-scale parallel computing systems. The LogP-based energy-saving model and the frequency scaling method were proposed to reduce energy consumption analytically and systematically for other two representative barrier algorithms: tournament barrier and central counter barrier. Furthermore, energy optimization methods of these two barrier algorithms were implemented on parallel computing platform. The experimental results validate the effectiveness of the energy optimization methods. 67.12% and 70.95% energy savings are obtained respectively for tournament barrier and central counter barrier on platforms with 2048 processes with 1.55%-8.80% performance loss. Furthermore, LogP-based energy-saving analytical model for these two barrier algorithms is highly accurate as the predicted energy savings are within 9.67% of the results obtained by simulation.
Similar content being viewed by others
References
YELICK K. Ten ways to waste a parallel computer [C]// Proceedings of the 36th Annual International Symposium on Computer Architecture. Austin, TX, USA: ACM, 2009: 1.
LI D, de SUPINSKI B, SCHULZ M, CAMERON K, NIKOLOPOULOS D S. Hybrid MPI/OpenMP power-aware computing [C]// Proceedings of 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS). Atlanta, GA: IEEE Press, 2010: 1–12.
PJESIVAC-GRBOVIC J, ANGSKUN T, BOSILCA G, FAGG G E, GABRIEL E, DONGARRA J J. Performance analysis of MPI collective operations [C]// Cluster Computing-07. Hingham, MA, USA: Kluwer Academic Publishers, 2007: 127–143.
YEW P C, TZENG N F, LAWRIE D H. Distributing hot-spot addressing in large scale multiprocessors [J]. IEEE Transactions on Computers, 1987, C-36(4): 388–395.
HENSGEN D, FINKEL R, MANBER U. Two algorithms for barrier synchronization [J]. Int J Parallel Program, 1988, 17(1): 1–17.
FREUDENTHAL E, GOTTLIEB A. Process coordination with fetch-and-increment [C]// ASPLOS-IV: Proceedings of the Fourth International Conference on Architectural Support for Programming Languages and Operating Systems: ACM Press, 1991: 260–268.
GOODMAN J R, VERNON M K, WOEST P J. Efficient synchronization primitives for large-scale cache-coherent multiprocessors [C]// ACM SIGARCH Computer Architecture News-Special Issue: Proceedings of ASPLOS-III: the Third International Conference on Architecture Support for Programming Languages and Operating Systems, 1989: 64–75.
BROOKS E D. The butterfly barrier [J]. International Journal of Parallel Programming, 1986, 15(4): 295–307.
CULLER D, KARP R, PATTERSON D, SAHAY A, SCHAUSER K E, SANTOS E, SUBRAMONIAN R, Eicken T von. LogP: Towards a realistic model of parallel computation [C]// Proceedings of the 4th ACM SIGPLAN Symposium on Principles and Practice of Parrallel Programming. New York: ACM, 1993: 1–12.
HOEFLER T, CERQUETTI L, MEHLAN T, MIETKE F, REHM W. A practical approach to the rating of barrier algorithms using the LogP model and open MPI [C]// Proceedings of the 2005 International Conference on Parallel Processing Workshops. Washington DC: IEEE Computer Society, 2005: 562–569.
Open MPI, Open source high performance computing [EB/OL]. [2012-09-10]. http://www.open-mpi.org/.
NANJEGOWDA R, HERNANDEZ O, CHAPMAN B. Scalability evaluation of barrier algorithms for openMP [C]// Proceedings of the 5th International Workshop on OpenMP: Evolving OpenMP in an Age of Extreme Parallelism. Dresden, Germany. Springer-Verlag. 2009: 42–52.
HOEFLER T, MEHLAN T, MIETKE F, REHM W. Fast barrier synchronization for InfiniBand [C]// Proceedings of the 20th IEEE International Parallel & Distributed Processing Symposium (IPDPS’06), CAC’06 Workshop. Greece. IEEE. 2006: 272–280.
LI Jian, MART J F, HUANG M C. The thrifty barrier: Energy-aware synchronization in shared-memory multiprocessors [C]// Proceedings of International Symposium on High-Performance Computer Architecture. Madrid, Spain: IEEE Computer Society, 2004: 14–23.
LI Jian, MARTINEZ J F. Power-performance implications of thread-level parallelism on chip multiprocessors [C]// Proceedings of Symposium on Performance Analysis of Systems and Software (ISPASS’05). Austin, TX: IEEE, 2005: 124–134.
LI Jian, MART J F. Power-performance considerations of parallel computing on chip multiprocessors [J]. ACM Trans Archit Code Optim, 2005, 2(4): 397–422.
GOLUBEV O, LOGH M, PONCINO M. On the energy efficiency of synchronization primitives for shared-memory single-chip multiprocessors [C]// Proceedings of the 17th ACM Great Lakes Symposium on VLSI. Stresa-Lago Maggiore, Italy: ACM. 2007: 489–492.
FERR C, BAHA R I, LOGH M, PONCINO M. Energy-optimal synchronization primitives for single-chip multi-processors [C]// Proceedings of the 19th ACM Great Lakes Symposium on VLSI. Boston Area, MA, USA: ACM. 2009: 141–144.
VILL O, PALERM G, SILVANO C. Efficiency and scalability of barrier synchronization on NoC based many-core architectures [C]// Proceedings of the 2008 International Conference on Compilers, Architectures and Synthesis for Embedded Systems. Atlanta, GA, USA: ACM. 2008: 81–90.
KANDEMI M, SON S W. Reducing power through compiler-directed barrier synchronization elimination [C]// Proceedings of the 2006 International Symposium on Low Power Electronics and Design. Tegernsee, Bavaria, Germany: ACM. 2006: 354–357.
KAPPIAH N, FREEH V W, LOWENTHAL D K. Just in time dynamic voltage scaling: Exploiting inter-node slack to save energy in MPI programs [C]// Proceedings of the ACM/IEEE SC2005 Conference on High Performance Networking and Computing (SC’05). Seattle, WA, USA: IEEE Computer Society. 2005: 12–18.
ROUNTREE B, LOWNENTHAL D K, SUPINSKI B R, SCHULZ M, FREEH V M, BLETSCH T. Adagio: Making DVS practical for complex HPC applications [C]// Proceedings of the 23rd international conference on Supercomputing. Yorktown Heights, NY, USA: ACM. 2009: 460–469.
Intel® Xeon® Processor X5670 (12M Cache, 2.93 GHz, 6.40 GT/s Intel® QPI) [EB/OL]. [2012-09-10]. http://ark.intel.com/products/47920/Intel-Xeon-Processor-X5670-12M-Cache-2_93-GHz-6_40-GTs-Intel-QPI#infosectioessentials.
XIE Min. Research and implementation of high-availability MPI parallel programming environment and parallel programming methods [D]. Changsha: School of Computer, National University of Defense Technology, 2007: 19–23. (in Chinese)
Author information
Authors and Affiliations
Corresponding author
Additional information
Foundation item: Projects(60903044, 61170049) supported by National Natural Science Foundation of China
Rights and permissions
About this article
Cite this article
Chen, J., Dong, Y. Energy optimization of representative barrier algorithms. J. Cent. South Univ. 19, 2823–2831 (2012). https://doi.org/10.1007/s11771-012-1348-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11771-012-1348-z