Abstract
Barrier is a collective operation used by many scientific applications and parallel libraries for synchronization. Typically, a Barrier operation is implemented by exchanging a short data message that requires demultiplexing, thereby adding undesired latency to the operation. In this work, we reduce the latency of Barrier operations for Cray XE/XK systems by leveraging the atomic operations provided by the Gemini interconnect, tailoring algorithms to utilize these capabilities, and utilizing a hierarchical design to arrive at an efficient implementation. Our micro-benchmark evaluation shows that for a 4,096 process Barrier operation, the atomic-operations-based Barrier outperforms the data exchange Barrier by 52% and the native Barrier by 111%.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Gabriel, E., Fagg, G.E., Bosilca, G., Angskun, T., Dongarra, J., Squyres, J.M., Sahay, V., Kambadur, P., Barrett, B.W., Lumsdaine, A., Castain, R.H., Daniel, D.J., Graham, R.L., Woodall, T.S.: Open MPI: Goals, Concept, and Design of a Next Generation MPI Implementation. In: Kranzlmüller, D., Kacsuk, P., Dongarra, J. (eds.) EuroPVM/MPI 2004. LNCS, vol. 3241, pp. 97–104. Springer, Heidelberg (2004)
Alverson, R., Roweth, D., Kaplan, L.: The Gemini System Interconnect. In: 2010 IEEE 18th Annual Symposium on High Performance Interconnects (HOTI), pp. 83–87 (August 2010)
Cray Inc.: Using the GNI and DMAPP APIs. In: Cray Software Document, vol. S-2446-4002 (December 2011)
Hoefler, T.: Evaluation of publicly available Barrier-Algorithms and Improvement of the Barrier-Operation for large-scale Cluster-Systems with special Attention on InfiniBand Networks, Chemnitz, Germany (2005)
Hensgen, D., Finkel, R., Manber, U.: Two algorithms for barrier synchronization. International Journal of Parallel Programming, 1–17 (February 01, 1988)
Almási, G., Heidelberger, P., Archer, C.J., Martorell, X., Erway, C.C., Moreira, J.E., Steinmacher-Burow, B., Zheng, Y.: Optimization of MPI collective communication on BlueGene/L systems. In: Proceedings of the 19th Annual International Conference on Supercomputing, ICS 2005. ACM, New York (2005)
Petrini, F., Coll, S., Frachtenberg, E., Hoisie, A.: Hardware- and Software-Based Collective Communication on the Quadrics Network. In: Proceedings of the IEEE International Symposium on Network Computing and Applications (NCA 2001), Washington, DC, USA (2001)
Yu, W., Buntinas, D., Graham, R.L., Panda, D.K.: Efficient and Scalable Barrier over Quadrics and Myrinet with a New NIC-Based Collective Message Passing Protocol. CoRR (2004)
Graham, R., Venkata, M.G., Ladd, J., Shamis, P., Rabinovitz, I., Filipov, V., Shainer, G.: Cheetah: A Framework for Scalable Hierarchical Collective Operations. In: CCGRID 2011 (2011)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Gorentla Venkata, M., Graham, R.L., Ladd, J.S., Shamis, P., Hjelm, N.T., Gutierrez, S.K. (2012). Exploiting Atomic Operations for Barrier on Cray XE/XK Systems. In: Träff, J.L., Benkner, S., Dongarra, J.J. (eds) Recent Advances in the Message Passing Interface. EuroMPI 2012. Lecture Notes in Computer Science, vol 7490. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33518-1_13
Download citation
DOI: https://doi.org/10.1007/978-3-642-33518-1_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-33517-4
Online ISBN: 978-3-642-33518-1
eBook Packages: Computer ScienceComputer Science (R0)