Improving MPI Applications Performance on Multicore Clusters with Rank Reordering

Mercier, Guillaume; Jeannot, Emmanuel

doi:10.1007/978-3-642-24449-0_7

Guillaume Mercier¹⁹ &
Emmanuel Jeannot¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 6960))

Included in the following conference series:

European MPI Users' Group Meeting

1159 Accesses
20 Citations

Abstract

Modern hardware architectures featuring multicores and a complex memory hierarchy raise challenges that need to be addressed by parallel applications programmers. It is therefore tempting to adapt an application communication pattern to the characteristics of the underlying hardware. The MPI standard features several functions that allow the ranks of MPI processes to be reordered according to a graph attached to a newly created communicator. In this paper, we explain how the MPICH2 implementation of the MPI_Dist_graph_create function was modified to reorder the MPI process ranks to create a match between the application communication pattern and the hardware topology. The experimental results on a multicore cluster show that improvements can be achieved as long as the application communication pattern is expressed by a relevant metric.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Hatazaki, T.: Rank reordering strategy for MPI topology creation functions. In: Alexandrov, V.N., Dongarra, J. (eds.) PVM/MPI 1998. LNCS, vol. 1497, pp. 188–195. Springer, Heidelberg (1998)
Chapter Google Scholar
Träff, J.L.: Implementing the MPI process topology mechanism. In: Supercomputing 2002: Proceedings of the 2002 ACM/IEEE Conference on Supercomputing, pp. 1–14. IEEE Computer Society Press, Los Alamitos (2002)
Google Scholar
Zhu, H., Goodell, D., Gropp, W., Thakur, R.: Hierarchical Collectives in MPICH2. In: Ropo, M., Westerholm, J., Dongarra, J. (eds.) PVM/MPI. LNCS, vol. 5759, pp. 325–326. Springer, Heidelberg (2009)
Chapter Google Scholar
Ma, C., Teo, Y.M., March, V., Xiong, N., Pop, I.R., He, Y.X., See, S.: An Approach for Matching Communication Patterns in Parallels Applications. In: Proceedings of 23rd IEEE International Parallel and Distributed Processing Symposium (IPDPS 2009). IEEE Computer Society Press, Rome (2009)
Google Scholar
Mercier, G., Clet-Ortega, J.: Towards an efficient process placement policy for MPI applications in multicore environments. In: Ropo, M., Westerholm, J., Dongarra, J. (eds.) PVM/MPI. LNCS, vol. 5759, pp. 104–115. Springer, Heidelberg (2009)
Chapter Google Scholar
Hoefler, T., Rabenseifner, R., Ritzdorf, H., de Supinski, B.R., Thakur, R., Träff, J.L.: The scalable process topology interface of mpi 2.2. Concurrency and Computation: Practice and Experience 23, 293–310 (2011)
Article Google Scholar
Argonne National Laboratory: MPICH2 (2004), http://www.mcs.anl.gov/mpi/
Broquedis, F., Clet-Ortega, J., Moreaud, S., Furmento, N., Goglin, B., Mercier, G., Thibault, S., Namyst, R.: hwloc: a Generic Framework for Managing Hardware Affinities in HPC Applications. In: Proceedings of the 18th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP 2010), Pisa, Italia. IEEE Computer Society Press, Los Alamitos (2010)
Google Scholar
Jeannot, E., Mercier, G.: Near-optimal placement of MPI processes on hierarchical NUMA architectures. In: D’Ambra, P., Guarracino, M., Talia, D. (eds.) Euro-Par 2010. LNCS, vol. 6272, pp. 199–210. Springer, Heidelberg (2010)
Chapter Google Scholar
Design of High Performance MVAPICH2: MPI2 over InfiniBand. In: Proceedings of the 6th IEEE International Symposium on Cluster Computing and the Grid. IEEE Computer Society, Los Alamitos (2006)
Google Scholar
Hayes, J.C., Norman, M.L., Fiedler, R.A., Bordner, J.O., Li, P.S., Clark, S.E., Ud-Doula, A., Mac Low, M.-M.: Simulating Radiating and Magnetized Flows in Multiple Dimensions with ZEUS-MP. The Astrophysical Journal Supplement 165, 188–228 (2006)
Article Google Scholar
Kleinjung, T., Aoki, K., Franke, J., Lenstra, A.K., Thomé, E., Bos, J.W., Gaudry, P., Kruppa, A., Montgomery, P.L., Osvik, D.A., te Riele, H., Timofeev, A., Zimmermann, P.: Factorization of a 768-bit RSA modulus. In: Rabin, T. (ed.) CRYPTO 2010. LNCS, vol. 6223, pp. 333–350. Springer, Heidelberg (2010), http://www.springerlink.com
Chapter Google Scholar
Kernighan, B.W., Lin, S.: An efficient heuristic procedure for partitioning graphs. Bell System Technical Journal 49, 291–307 (1970)
Article MATH Google Scholar
Pellegrini, F.: Static Mapping by Dual Recursive Bipartitioning of Process and Architecture Graphs. In: IEEE Proceedings of SHPCC 1994, Knoxville, pp. 486–493 (1994)
Google Scholar
Chen, H., Chen, W., Huang, J., Robert, B., Kuhn, H.: Mpipp: an automatic profile-guided parallel process placement toolset for smp clusters and multiclusters. In: Egan, G.K., Muraoka, Y. (eds.) ICS, ACM, pp. 353–360 (2006)
Google Scholar
National Institute for Computational Sciences: (MPI Tips on Cray XT5), http://www.nics.tennessee.edu/user-support/mpi-tips-for-cray-xt5
Solt, D.: A profile based approach for topology aware MPI rank placement (2007), http://www.tlc2.uh.edu/hpcc07/Schedule/speakers/hpcc_hp-mpi_solt.ppt
Duesterwald, E., Wisniewski, R.W., Sweeney, P.F., Cascaval, G., Smith, S.E.: Method and System for Optimizing Communication in MPI Programs for an Execution Environment (2008), http://www.faqs.org/patents/app/20080288957

Download references

Author information

Authors and Affiliations

Université de Bordeaux, INRIA, LaBRI, 351, cours de la Libération, F-33405, Talence, France
Guillaume Mercier & Emmanuel Jeannot

Authors

Guillaume Mercier
View author publications
You can also search for this author in PubMed Google Scholar
Emmanuel Jeannot
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Informatics and Telecommunications, University of Athens, 15784, Athens, Greece
Yiannis Cotronis
University of Tennessee, 1122 Volunteer Blvd, 37996-3450, Knoxville, TN, USA
Anthony Danalis & Jack Dongarra &
University of Crete, Heraklion, Greece
Dimitrios S. Nikolopoulos

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mercier, G., Jeannot, E. (2011). Improving MPI Applications Performance on Multicore Clusters with Rank Reordering. In: Cotronis, Y., Danalis, A., Nikolopoulos, D.S., Dongarra, J. (eds) Recent Advances in the Message Passing Interface. EuroMPI 2011. Lecture Notes in Computer Science, vol 6960. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24449-0_7

Download citation

DOI: https://doi.org/10.1007/978-3-642-24449-0_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-24448-3
Online ISBN: 978-3-642-24449-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics