Abstract
MPI process placement can play a deterministic role concerning the application performance. This is especially true with nowadays architecture (heterogenous, multicore with different level of caches, etc.). In this paper, we will describe a novel algorithm called TreeMatch that maps processes to resources in order to reduce the communication cost of the whole application. We have implemented this algorithm and will discuss its performance using simulation and on the NAS benchmarks.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Broquedis, F., Clet-Ortega, J., Moreaud, S., Furmento, N., Goglin, B., Mercier, G., Thibault, S., Namyst, R.: hwloc: a Generic Framework for Managing Hardware Affinities in HPC Applications. In: Proceedings of the 18th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP 2010). IEEE Computer Society Press, Pisa (February 2010), http://hal.inria.fr/inria-00429889
Chen, H., Chen, W., Huang, J., Robert, B., Kuhn, H.: Mpipp: an automatic profile-guided parallel process placement toolset for smp clusters and multiclusters. In: Egan, G.K., Muraoka, Y. (eds.) ICS, pp. 353–360. ACM, New York (2006)
Solt, D.: A profile based approach for topology aware MPI rank placement (2007), http://www.tlc2.uh.edu/hpcc07/Schedule/speakers/hpcc_hp-mpi_solt.ppt
Duesterwald, E., Wisniewski, R.W., Sweeney, P.F., Cascaval, G., Smith, S.E.: Method and System for Optimizing Communication in MPI Programs for an Execution Environment (2008), http://www.faqs.org/patents/app/20080288957
Pellegrini, F.: Static Mapping by Dual Recursive Bipartitioning of Process and Architecture Graphs. In: Proceedings of SHPCC 1994, Knoxville, pp. 486–493. IEEE, Los Alamitos (May 1994)
Träff, J.L.: Implementing the MPI process topology mechanism. In: Supercomputing 2002: Proceedings of the 2002 ACM/IEEE conference on Supercomputing, pp. 1–14. IEEE Computer Society Press, Los Alamitos (2002)
Kako, A., Ono, T., Hirata, T., Halldórsson, M.M.: Approximation algorithms for the weighted independent set problem. In: Kratsch, D. (ed.) WG 2005. LNCS, vol. 3787, pp. 341–350. Springer, Heidelberg (2005)
Kneser, M.: Aufgabe 300. Jahresber. Deutsch. Math. -Verein 58 (1955)
Kruskal, C., Snir, M.: Cost-performance tradeoffs for communication networks. Discrete Applied Mathematics 37-38, 359–385 (1992)
Mercier, G., Clet-Ortega, J.: Towards an efficient process placement policy for mpi applications in multicore environments. In: Ropo, M., Westerholm, J., Dongarra, J. (eds.) Recent Advances in Parallel Virtual Machine and Message Passing Interface. LNCS, vol. 5759, pp. 104–115. Springer, Heidelberg (2009)
Zhai, J., Sheng, T., He, J., Chen, W., Zheng, W.: Fact: fast communication trace collection for parallel applications through program slicing. In: SC. ACM, New York (2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Jeannot, E., Mercier, G. (2010). Near-Optimal Placement of MPI Processes on Hierarchical NUMA Architectures. In: D’Ambra, P., Guarracino, M., Talia, D. (eds) Euro-Par 2010 - Parallel Processing. Euro-Par 2010. Lecture Notes in Computer Science, vol 6272. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15291-7_20
Download citation
DOI: https://doi.org/10.1007/978-3-642-15291-7_20
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-15290-0
Online ISBN: 978-3-642-15291-7
eBook Packages: Computer ScienceComputer Science (R0)