Optimization of Collective Communication in Intra-cell MPI

Velamati, M. K.; Kumar, A.; Jayam, N.; Senthilkumar, G.; Baruah, P. K.; Sharma, R.; Kapoor, S.; Srinivasan, A.

doi:10.1007/978-3-540-77220-0_45

M. K. Velamati¹,
A. Kumar¹,
N. Jayam¹,
G. Senthilkumar¹,
P. K. Baruah¹,
R. Sharma¹,
S. Kapoor² &
…
A. Srinivasan³

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4873))

Included in the following conference series:

International Conference on High-Performance Computing

1826 Accesses
16 Citations

Abstract

The Cell is a heterogeneous multi-core processor, which has eight co-processors, called SPEs. The SPEs can access a common shared main memory through DMA, and each SPE can directly operate on a small distinct local store. An MPI implementation can use each SPE as if it were a node for an MPI process. In this paper, we discuss the efficient implementation of collective communication operations for intra-Cell MPI, both for cores on a single chip, and for a Cell blade. While we have implemented all the collective operations, we describe in detail the following: barrier, broadcast, and reduce. The main contributions of this work are (i) describing our implementation, which achieves low latencies and high bandwidths using the unique features of the Cell, and (ii) comparing different algorithms, and evaluating the influence of the architectural features of the Cell processor on their effectiveness.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Gupta, R., Balaji, P., Panda, D.K., Nieplocha, J.: Efficient Collective Operations Using Remote Memory Operations on VIA-Based Clusters. In: Proceedings of IPDPS (2003)
Google Scholar
Kini, S.P., Liu, J., Wu, J., Wyckoff, P., Panda, D.K.: Fast and Scalable Barrier Using RDMA and Multicast Mechanisms for Infiniband-Based Clusters. In: Proceedings of Euro PVM/MPI Conference (2003)
Google Scholar
Krishna, M., Kumar, A., Jayam, N., Senthilkumar, G., Baruah, P.K., Kapoor, S., Sharma, R., Srinivasan, A.: A Buffered Mode MPI Implementation for the Cell BE Processor. In: ICCS. LNCS, vol. 4487, pp. 603–610. Springer, Heidelberg (2007)
Google Scholar
Krishna, M., Kumar, A., Jayam, N., Senthilkumar, G., Baruah, P.K., Kapoor, S., Sharma, R., Srinivasan, A.: Optimization of Collective Communication in Intra-Cell MPI, Technical Report TR-070724, Dept. of Computer Science, Florida State University (2007), http://www.cs.fsu.edu/research/reports/TR-070724.pdf
Mamidala, A.R., Chai, L., Jin, H-W., Panda, D.K.: Efficient SMP-Aware MPI-Level Broadcast over Infiniband’s Hardware Multicast. In: Communication Architecture for Clusters Workshop, in Proceedings of IPDPS (2006)
Google Scholar
Ohara, M., Inoue, H., Sohda, Y., Komatsu, H., Nakatani, T.: MPI Microtask for Programming the Cell Broadband EngineTM Processor. IBM Systems Journal 45, 85–102 (2006)
Article Google Scholar
Sistare, S., vande Vaart, R., Loh, E.: Optimization of MPI Collectives on Clusters of Large-Scale SMP’s. In: Proceedings of SC1999 (1999)
Google Scholar
Thakur, R., Rabenseifner, R., Gropp, W.: Optimization of Collective Communication Operations in MPICH. International Journal of High Performance Computing Applications 19, 49–66 (2005)
Article Google Scholar
Tipparaju, V., Nieplocha, J., Panda, D.K.: Fast Collective Operations Using Shared and Remote Memory Access Protocols on Clusters. In: Proceedings of IPDPS (2003)
Google Scholar
Yu, W., Buntinas, D., Graham, R.L., Panda, D.K.: Efficient and Scalable Barrier over Quadrics and Myrinet with a New NIC-Based Collective Message Passing Protocol. In: Workshop on Communication Architecture for Clusters, in Proceedings of IPDPS (2004)
Google Scholar

Download references

Author information

Authors and Affiliations

Dept. of Mathematics and Computer Science, Sri Sathya Sai University,
M. K. Velamati, A. Kumar, N. Jayam, G. Senthilkumar, P. K. Baruah & R. Sharma
IBM, Austin,
S. Kapoor
Dept. of Computer Science, Florida State University,
A. Srinivasan

Authors

M. K. Velamati
View author publications
You can also search for this author in PubMed Google Scholar
A. Kumar
View author publications
You can also search for this author in PubMed Google Scholar
N. Jayam
View author publications
You can also search for this author in PubMed Google Scholar
G. Senthilkumar
View author publications
You can also search for this author in PubMed Google Scholar
P. K. Baruah
View author publications
You can also search for this author in PubMed Google Scholar
R. Sharma
View author publications
You can also search for this author in PubMed Google Scholar
S. Kapoor
View author publications
You can also search for this author in PubMed Google Scholar
A. Srinivasan
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Srinivas Aluru Manish Parashar Ramamurthy Badrinath Viktor K. Prasanna

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Velamati, M.K. et al. (2007). Optimization of Collective Communication in Intra-cell MPI. In: Aluru, S., Parashar, M., Badrinath, R., Prasanna, V.K. (eds) High Performance Computing – HiPC 2007. HiPC 2007. Lecture Notes in Computer Science, vol 4873. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-77220-0_45

Download citation

DOI: https://doi.org/10.1007/978-3-540-77220-0_45
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-77219-4
Online ISBN: 978-3-540-77220-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics