Skip to main content

Optimization of Collective Communication in Intra-cell MPI

  • Conference paper
High Performance Computing – HiPC 2007 (HiPC 2007)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4873))

Included in the following conference series:

Abstract

The Cell is a heterogeneous multi-core processor, which has eight co-processors, called SPEs. The SPEs can access a common shared main memory through DMA, and each SPE can directly operate on a small distinct local store. An MPI implementation can use each SPE as if it were a node for an MPI process. In this paper, we discuss the efficient implementation of collective communication operations for intra-Cell MPI, both for cores on a single chip, and for a Cell blade. While we have implemented all the collective operations, we describe in detail the following: barrier, broadcast, and reduce. The main contributions of this work are (i) describing our implementation, which achieves low latencies and high bandwidths using the unique features of the Cell, and (ii) comparing different algorithms, and evaluating the influence of the architectural features of the Cell processor on their effectiveness.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Gupta, R., Balaji, P., Panda, D.K., Nieplocha, J.: Efficient Collective Operations Using Remote Memory Operations on VIA-Based Clusters. In: Proceedings of IPDPS (2003)

    Google Scholar 

  2. Kini, S.P., Liu, J., Wu, J., Wyckoff, P., Panda, D.K.: Fast and Scalable Barrier Using RDMA and Multicast Mechanisms for Infiniband-Based Clusters. In: Proceedings of Euro PVM/MPI Conference (2003)

    Google Scholar 

  3. Krishna, M., Kumar, A., Jayam, N., Senthilkumar, G., Baruah, P.K., Kapoor, S., Sharma, R., Srinivasan, A.: A Buffered Mode MPI Implementation for the Cell BE Processor. In: ICCS. LNCS, vol. 4487, pp. 603–610. Springer, Heidelberg (2007)

    Google Scholar 

  4. Krishna, M., Kumar, A., Jayam, N., Senthilkumar, G., Baruah, P.K., Kapoor, S., Sharma, R., Srinivasan, A.: Optimization of Collective Communication in Intra-Cell MPI, Technical Report TR-070724, Dept. of Computer Science, Florida State University (2007), http://www.cs.fsu.edu/research/reports/TR-070724.pdf

  5. Mamidala, A.R., Chai, L., Jin, H-W., Panda, D.K.: Efficient SMP-Aware MPI-Level Broadcast over Infiniband’s Hardware Multicast. In: Communication Architecture for Clusters Workshop, in Proceedings of IPDPS (2006)

    Google Scholar 

  6. Ohara, M., Inoue, H., Sohda, Y., Komatsu, H., Nakatani, T.: MPI Microtask for Programming the Cell Broadband EngineTM Processor. IBM Systems Journal 45, 85–102 (2006)

    Article  Google Scholar 

  7. Sistare, S., vande Vaart, R., Loh, E.: Optimization of MPI Collectives on Clusters of Large-Scale SMP’s. In: Proceedings of SC1999 (1999)

    Google Scholar 

  8. Thakur, R., Rabenseifner, R., Gropp, W.: Optimization of Collective Communication Operations in MPICH. International Journal of High Performance Computing Applications 19, 49–66 (2005)

    Article  Google Scholar 

  9. Tipparaju, V., Nieplocha, J., Panda, D.K.: Fast Collective Operations Using Shared and Remote Memory Access Protocols on Clusters. In: Proceedings of IPDPS (2003)

    Google Scholar 

  10. Yu, W., Buntinas, D., Graham, R.L., Panda, D.K.: Efficient and Scalable Barrier over Quadrics and Myrinet with a New NIC-Based Collective Message Passing Protocol. In: Workshop on Communication Architecture for Clusters, in Proceedings of IPDPS (2004)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Srinivas Aluru Manish Parashar Ramamurthy Badrinath Viktor K. Prasanna

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Velamati, M.K. et al. (2007). Optimization of Collective Communication in Intra-cell MPI. In: Aluru, S., Parashar, M., Badrinath, R., Prasanna, V.K. (eds) High Performance Computing – HiPC 2007. HiPC 2007. Lecture Notes in Computer Science, vol 4873. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-77220-0_45

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-77220-0_45

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-77219-4

  • Online ISBN: 978-3-540-77220-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics