Abstract
As the scale of High Performance Computing (HPC) systems continues to increase, demanding that we extract even more parallelism from applications, the need to move communication management away from the Central Processing Unit (CPU) becomes even greater. Moving this management to the network, frees up CPU cycles for computation, making it possible to overlap computation and communication. In this paper we continue to investigate how to best use the new CORE-Direct support added in the ConnectX-2 Host Channel Adapter (HCA) for creating high performance, asynchronous collective operations that are managed by the HCA. Specifically we consider the network topology, creating a two-level communication hierarchy, reducing the MPI_Barrier completion time by 45%, from 26.59 microseconds, when not considering network topology, to 14.72 microseconds, with the CPU based collective barrier operation completing in 19.04 microseconds. The nonblocking barrier algorithm has similar performance, with about 50% of that time available for computation.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
InfiniBand Trade Association, http://www.infinibandta.org/specs
Mellanox Technologies, http://www.mellanox.com/
Mvapich, http://mvapich.cse.ohio-state.edu/
Quadrics, http://www.quadrics.com/
Top 500 Super Computer Sites, http://www.top500.org/
Bhoedjang, R.A.F., Ruhl, T., Bal, H.E.: Efficient Multicast on Myrinet Using Link-Level Flow Control. In: 27th ICPP (1998)
Buntinas, D., Panda, D.K.: NIC-Based Reduction in Myrinet Clusters: Is It Beneficial. In: SAN-2002 Workshop (in conjunction with HPCA) (February 2003)
Buntinas, D., Panda, D.K., Sadayappan, P.: Fast NIC-Level Barrier over Myrinet/GM. In: Proceedings of IPDPS (2001)
Garbriel, E., et al: Open MPI: Goals, Concept, and Design of a Next Generation MPI Implementation. In: Proceedings, 11th European PVM/MPI Users’ Group Meeting (2004)
Graham, R.L., et al.: A Network-Failure-tolerant Message-Passing System for Terascale Clusters. In: Proceedings of ICS (June 2002)
Kumar, S., et al: The deep computing messaging framework: generalized scalable message passing on the blue gene/P supercomputer. In: ICS 2008: Proceedings of the 22nd annual international conference on Supercomputing, pp. 94–103. ACM, New York (2008)
Graham, R.L., Poole, S., Shamis, P., Bloch, G., Bloch, N., Chapman, H., Kagan, M., Shahar, A., Rabinovitz, I., Shainer, G.: ConnectX-2 InfiniBand Management Queues: First investigation of the new support for network offloaded collective operations. Accepted for the 10th IEEE/ACM International Symposium CCGrid (2010)
Graham, R.L., Poole, S., Shamis, P., Bloch, G., Bloch, N., Chapman, H., Kagan, M., Shahar, A., Rabinovitz, I., Shainer, G.: Overlapping Computation and Communication: Barrier Algorithms and ConnectX-2 Core-DIRECT Capabilities. Accepted to CAC (2010)
Hoefler, T., Lumsdaine, A.: Optimizing non-blocking Collective Operations for InfiniBand. In: Proceedings of the 22nd IPDPS (April 2008)
Hoefler, T., Lumsdaine, A., Rehm, W.: Implementation and Performance Analysis of Non-Blocking Collective Operations for MPI. In: SC 2007: Proceedings of the SC 2007, pp. 1–10. ACM, New York (2007)
Dongarra, J., et al.: The International Exascale Software Project: a Call To Cooperative Action By the Global High-Performance Community. Int. J. High Perform. Comput. Appl. 23(4), 309–322 (2009)
Kielmann, T., Hofman, R.F.H., Bal, H.E., Plaat, A., Bhoedjang, R.A.F.: MagPIe: MPI’s collective communication operations for clustered wide area systems. SIGPLAN Not. 34, 131–140 (1999)
Lawry, W., Wilson, C., Maccabe, A.B., Brightwell, R.: Comb: a portable benchmark suite for assessing mpi overlap. In: 2002 IEEE International Conference on Cluster Computing, pp. 472–475 (2002)
Message Passing Interface Forum. MPI: A Message-Passing Standard (June 2008)
Moody, A., Fernandez, J., Petrini, F., Panda, D.: Scalable NIC-based Reduction on Large-Scale Clusters. In: SC 2003 (November 2003)
Mraz, R.: Reducing the Variance of Point to Point Transfers in the IBM 9076 Parallel Computer. In: Proceedings of the 1994 ACM/IEEE conference on Supercomputing, pp. 620–629 (November 1994)
Sancho, J.C., Kerbyson, D.J., Barker, K.J.: Efficient Offloading of Collective Communications in Large-Scale Systems. In: IEEE International Conference on Cluster Computing, pp. 169–178 (2007)
Steffenel, L.A., Mounié, G.: A Framework for Adaptive Collective Communications for Heterogeneous Hierarchical Computing Systems. J. Comput. Syst. Sci. 74(6), 1082–1093 (2008)
Tipparaju, V., Nieplocha, J., Panda, D.: Fast Collective Operations Using Shared and Remote Memory Access Protocols on Clusters. In: Proceedings of the IPDPS (2003)
Yu, W., Buntinas, D., Graham, R.L., Panda, D.K.: Efficient and Scalable Barrier over Quadrics and Myrinet with a New NIC-Based Collective Message Passing Protocol. In: CAC Workshop, in Conjunction IPDPS 2004 (April 2004)
Yu, W., Buntinas, D., Panda, D.K.: High Performance and Reliable NIC-Based Multicast over Myrinet/GM-2. In: Proceedings of the IPDPS 2003 (October 2003)
Zhu, H., Goodell, D., Gropp, W., Thakur, R.: Hierarchical Collectives in MPICH2. In: Proceedings of the 16th European PVM/MPI Users’ Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface, pp. 325–326. Springer, Heidelberg (2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Rabinovitz, I., Shamis, P., Graham, R.L., Bloch, N., Shainer, G. (2010). Network Offloaded Hierarchical Collectives Using ConnectX-2’s CORE-Direct Capabilities. In: Keller, R., Gabriel, E., Resch, M., Dongarra, J. (eds) Recent Advances in the Message Passing Interface. EuroMPI 2010. Lecture Notes in Computer Science, vol 6305. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15646-5_11
Download citation
DOI: https://doi.org/10.1007/978-3-642-15646-5_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-15645-8
Online ISBN: 978-3-642-15646-5
eBook Packages: Computer ScienceComputer Science (R0)