Skip to main content

A Scalable MPI_Comm_split Algorithm for Exascale Computing

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 6305))

Abstract

Existing algorithms for creating communicators in MPI programs will not scale well to future exascale supercomputers containing millions of cores. In this work, we present a novel communicator-creation algorithm that does scale well into millions of processes using three techniques: replacing the sorting at the end of MPI_Comm_split  with merging as the color and key table is built, sorting the color and key table in parallel, and using a distributed table to store the output communicator data rather than a replicated table. This reduces the time cost of MPI_Comm_split in the worst case we consider from 22 seconds to 0.37 second. Existing algorithms build a table with as many entries as processes, using vast amounts of memory. Our algorithm uses a small, fixed amount of memory per communicator after MPI_Comm_split has finished and uses a fraction of the memory used by the conventional algorithm for temporary storage during the execution of MPI_Comm_split.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Cheng, D.R., Edelman, A., Gilbert, J.R., Shah, V.: A novel parallel sorting algorithm for contemporary architectures. Submitted to ALENEX 2006 (2006)

    Google Scholar 

  2. Choudhury, N., Mehta, Y., Wilmarth, T.L., Bohm, E.J., Kalé, L.V.: Scaling an optimistic parallel simulation of large-scale interconnection networks. In: WSC 2005: Proceedings of the 37th conference on Winter simulation,Winter Simulation Conference, pp. 591–600 (2005)

    Google Scholar 

  3. Gabriel, E., Fagg, G.E., Bosilca, G., Angskun, T., Dongarra, J.J., Squyres, J.M., Sahay, V., Kambadur, P., Barrett, B., Lumsdaine, A., Castain, R.H., Daniel, D.J., Graham, R.L., Woodall, T.S.: Open MPI: Goals, concept, and design of a next generation MPI implementation. In: Proceedings, 11th European PVM/MPI Users’ Group Meeting, Budapest, Hungary, September 2004, pp. 97–104 (2004)

    Google Scholar 

  4. Gropp, W., Lusk, E., Doss, N., Skjellum, A.: A high-performance, portable implementation of the MPI message passing interface standard. Parallel Computing 22(6), 789–828 (1996)

    Article  MATH  Google Scholar 

  5. Gropp, W.D., Lusk, E.: User’s Guide for mpich, a Portable Implementation of MPI. Mathematics and Computer Science Division. Argonne National Laboratory (1996), ANL-96/6

    Google Scholar 

  6. Meuer, H., Strohmaier, E., Dongarra, J., Simon, H.: TOP500 Supercomputing Sites (2010), http://top500.org (accessed March 24, 2010)

  7. Saukas, E.L.G., Song, S.W.: A note on parallel selection on coarse grained multicomputers. Algorithmica 24, 371–380 (1999)

    Article  MathSciNet  Google Scholar 

  8. Shi, H., Schaeffer, J.: Parallel sorting by regular sampling. J. Parallel Distrib. Comput. 14(4), 361–372 (1992)

    Article  MATH  Google Scholar 

  9. Sutter, H.: The free lunch is over: A fundamental turn toward concurrency in software. Dr. Dobb’s Journal 30(3) (March 2005)

    Google Scholar 

  10. Thakur, R., Gropp, W.: Improving the performance of collective operations in mpich. In: Dongarra, J., Laforenza, D., Orlando, S. (eds.) EuroPVM/MPI 2003. LNCS, vol. 2840, pp. 257–267. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Sack, P., Gropp, W. (2010). A Scalable MPI_Comm_split Algorithm for Exascale Computing. In: Keller, R., Gabriel, E., Resch, M., Dongarra, J. (eds) Recent Advances in the Message Passing Interface. EuroMPI 2010. Lecture Notes in Computer Science, vol 6305. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15646-5_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-15646-5_1

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-15645-8

  • Online ISBN: 978-3-642-15646-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics