Abstract
Existing algorithms for creating communicators in MPI programs will not scale well to future exascale supercomputers containing millions of cores. In this work, we present a novel communicator-creation algorithm that does scale well into millions of processes using three techniques: replacing the sorting at the end of MPI_Comm_split with merging as the color and key table is built, sorting the color and key table in parallel, and using a distributed table to store the output communicator data rather than a replicated table. This reduces the time cost of MPI_Comm_split in the worst case we consider from 22 seconds to 0.37 second. Existing algorithms build a table with as many entries as processes, using vast amounts of memory. Our algorithm uses a small, fixed amount of memory per communicator after MPI_Comm_split has finished and uses a fraction of the memory used by the conventional algorithm for temporary storage during the execution of MPI_Comm_split.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Cheng, D.R., Edelman, A., Gilbert, J.R., Shah, V.: A novel parallel sorting algorithm for contemporary architectures. Submitted to ALENEX 2006 (2006)
Choudhury, N., Mehta, Y., Wilmarth, T.L., Bohm, E.J., Kalé, L.V.: Scaling an optimistic parallel simulation of large-scale interconnection networks. In: WSC 2005: Proceedings of the 37th conference on Winter simulation,Winter Simulation Conference, pp. 591–600 (2005)
Gabriel, E., Fagg, G.E., Bosilca, G., Angskun, T., Dongarra, J.J., Squyres, J.M., Sahay, V., Kambadur, P., Barrett, B., Lumsdaine, A., Castain, R.H., Daniel, D.J., Graham, R.L., Woodall, T.S.: Open MPI: Goals, concept, and design of a next generation MPI implementation. In: Proceedings, 11th European PVM/MPI Users’ Group Meeting, Budapest, Hungary, September 2004, pp. 97–104 (2004)
Gropp, W., Lusk, E., Doss, N., Skjellum, A.: A high-performance, portable implementation of the MPI message passing interface standard. Parallel Computing 22(6), 789–828 (1996)
Gropp, W.D., Lusk, E.: User’s Guide for mpich, a Portable Implementation of MPI. Mathematics and Computer Science Division. Argonne National Laboratory (1996), ANL-96/6
Meuer, H., Strohmaier, E., Dongarra, J., Simon, H.: TOP500 Supercomputing Sites (2010), http://top500.org (accessed March 24, 2010)
Saukas, E.L.G., Song, S.W.: A note on parallel selection on coarse grained multicomputers. Algorithmica 24, 371–380 (1999)
Shi, H., Schaeffer, J.: Parallel sorting by regular sampling. J. Parallel Distrib. Comput. 14(4), 361–372 (1992)
Sutter, H.: The free lunch is over: A fundamental turn toward concurrency in software. Dr. Dobb’s Journal 30(3) (March 2005)
Thakur, R., Gropp, W.: Improving the performance of collective operations in mpich. In: Dongarra, J., Laforenza, D., Orlando, S. (eds.) EuroPVM/MPI 2003. LNCS, vol. 2840, pp. 257–267. Springer, Heidelberg (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Sack, P., Gropp, W. (2010). A Scalable MPI_Comm_split Algorithm for Exascale Computing. In: Keller, R., Gabriel, E., Resch, M., Dongarra, J. (eds) Recent Advances in the Message Passing Interface. EuroMPI 2010. Lecture Notes in Computer Science, vol 6305. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15646-5_1
Download citation
DOI: https://doi.org/10.1007/978-3-642-15646-5_1
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-15645-8
Online ISBN: 978-3-642-15646-5
eBook Packages: Computer ScienceComputer Science (R0)