Assessing Contention Effects on MPI_Alltoall Communications

  • Luiz Angelo Steffenel
  • Maxime Martinasso
  • Denis Trystram
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4459)


One of the most important collective communication patterns used in scientific applications is the complete exchange, also called All-to-All. Although efficient algorithms have been studied for specific networks, general solutions like those available in well-known MPI distributions (e.g. the MPI_Alltoall operation) are strongly influenced by the congestion of network resources. In this paper we present an integrated approach to model the performance of the All-to-All collective operation, which consists in identifying a contention signature that characterizes a given network environment, using it to augment a contention-free communication model. This approach, assessed by experimental results, allows an accurate prediction of the performance of the All-to-All operation over different network architectures with a small overhead.


Network Contention MPI Collective Communications Performance Modeling 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Christara, C., Ding, X., Jackson, K.: An efficient transposition algorithm for distributed memory computers. In: Proceedings of the High Performance Computing Systems and Applications, pp. 349–368 (1999)Google Scholar
  2. 2.
    Midorikawa, E.T., Oliveira, H.M., Laine, J.M.: PEMPIs: A new metodology for modeling and prediction of MPI programs performance. In: Proceedings of the SBAC-PAD 2004, pp. 254–261. IEEE Computer Society Press, Los Alamitos (2004)Google Scholar
  3. 3.
    Barchet-Steffenel, L.A., Mounie, G.: Scheduling heuristics for efficient broadcast operations on grid environments. In: Proceedings of the Performance Modeling, Evaluation and Optimization of Parallel and Distributed Systems Workshop - PMEO’06 (associated to IPDPS’06), Rhodes Island, Greece, Apr. 2006, IEEE Computer Society Press, Los Alamitos (2006)Google Scholar
  4. 4.
    Kielmann, T., et al.: Network performance-aware collective communication for clustered wide area systems. Parallel Computing 27(11), 1431–1456 (2001)CrossRefMATHGoogle Scholar
  5. 5.
    Chun, A.T.T., Wang, C.-L.: Realistic communication model for parallel computing on cluster. In: Proceedings of the International Workshop on Cluster Computing, pp. 92–101 (1999)Google Scholar
  6. 6.
    Chun, A.T.T.: Performance Studies of High-Speed Communication on Commodity Cluster. PhD thesis, University of Hong Kong (2001)Google Scholar
  7. 7.
    Pjesivac-Grbovic, J., et al.: Performance analysis of MPI collective operations. In: Proceedings of the Wokshop on Performance Modeling, Evaluation and Optimisation for Parallel and Distributed Systems (PMEO), in IPDPS (2005)Google Scholar
  8. 8.
    Johnssonn, S.L., Ho, C.-T.: Optimum broadcasting and personalized communication in hypercubes. IEEE Transactions on Computers 38(9), 1249–1268 (1989)CrossRefGoogle Scholar
  9. 9.
    Grove, D.: Performance Modelling of Message-Passing Parallel Programs. PhD thesis, University of Adelaide (2003)Google Scholar
  10. 10.
    Adve, V.: Analysing the Behavior and Performance of Parallel Programs. PhD thesis, University of Wisconsin, Computer Sciences Department (1993)Google Scholar
  11. 11.
    Bruck, J., et al.: Efficient algorithms for all-to-all communications in multiport message-passing systems. IEEE Transactions on Parallel and Distributed Systems 8(11), 1143–1156 (1997)CrossRefGoogle Scholar
  12. 12.
    Clement, M., Steed, M., Crandall, P.: Network performance modelling for PM clusters. In: Proceedings of Supercomputing (1996)Google Scholar
  13. 13.
    Labarta, J., et al.: DiP: A parallel program development environment. In: Fraigniaud, P., et al. (eds.) Euro-Par 1996. LNCS, vol. 1124, pp. 665–674. Springer, Heidelberg (1996)CrossRefGoogle Scholar
  14. 14.
    König, J.C., Rao, P.S., Trystram, D.: Analysis of gossiping algorithms with restricted buffers. Parallel Algorithms and Applications 13(2), 117–133 (1998)MathSciNetMATHGoogle Scholar
  15. 15.
    Jeannot, E.: Two fast and efficient message scheduling algorithms for data redistribution through a backbone. In: Proceedings of the IPDPS (2004)Google Scholar
  16. 16.
    Moritz, C.A., Frank, M.I.: LoGPC: Modeling network contention in message-passing programs. IEEE Transactions on Parallel and Distributed Systems 12(4), 404–415 (2001)CrossRefGoogle Scholar
  17. 17.
    Hockney, R.W.: The communication challenge for MPP: Intel paragon and meiko cs-2. Parallel Computing 20, 389–398 (1994)Google Scholar
  18. 18.
    Jeannot, E., Steffenel, L.A.: Fast and Efficient Total Exchange on Two Clusters. In: Kermarrec, A.-M., Bougé, L., Priol, T. (eds.) Euro-Par 2007. LNCS, vol. 4641, pp. 848–857. Springer, Heidelberg (2007)CrossRefGoogle Scholar

Copyright information

© Springer Berlin Heidelberg 2007

Authors and Affiliations

  • Luiz Angelo Steffenel
    • 1
  • Maxime Martinasso
    • 2
  • Denis Trystram
    • 2
  1. 1.Université Nancy-2, LORIA, AlGorille Team, NancyFrance
  2. 2.LIG - Laboratoire d’Informatique de Grenoble, GrenobleFrance

Personalised recommendations