Cluster Computing

, Volume 10, Issue 2, pp 127–143 | Cite as

Performance analysis of MPI collective operations

  • Jelena Pješivac-GrbovićEmail author
  • Thara Angskun
  • George Bosilca
  • Graham E. Fagg
  • Edgar Gabriel
  • Jack J. Dongarra


Previous studies of application usage show that the performance of collective communications are critical for high-performance computing. Despite active research in the field, both general and feasible solution to the optimization of collective communication problem is still missing.

In this paper, we analyze and attempt to improve intra-cluster collective communication in the context of the widely deployed MPI programming paradigm by extending accepted models of point-to-point communication, such as Hockney, LogP/LogGP, and PLogP, to collective operations. We compare the predictions from models against the experimentally gathered data and using these results, construct optimal decision function for broadcast collective. We quantitatively compare the quality of the model-based decision functions to the experimentally-optimal one. Additionally, in this work, we also introduce a new form of an optimized tree-based broadcast algorithm, splitted-binary.

Our results show that all of the models can provide useful insights into various aspects of the different algorithms as well as their relative performance. Still, based on our findings, we believe that the complete reliance on models would not yield optimal results. In addition, our experimental results have identified the gap parameter as being the most critical for accurate modeling of both the classical point-to-point-based pipeline and our extensions to fan-out topologies.


MPI collective communication Performance modeling Parallel communication models Hockney LogP LogGP PLogP 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Rabenseifner, R.: Automatic MPI counter profiling of all users: First results on a CRAY T3E 900-512. In: Proceedings of the Message Passing Interface Developer’s and User’s Conference, 1999, pp. 77–85 Google Scholar
  2. 2.
    Vadhiyar, S.S., Fagg, G.E., Dongarra, J.J.: Automatically tuned collective communications. In: Proceedings of the 2000 ACM/IEEE conference on Supercomputing (CDROM), IEEE Computer Society, 2000, p. 3 Google Scholar
  3. 3.
    Hockney, R.: The communication challenge for MPP: Intel Paragon and Meiko CS-2. Parallel Comput. 20(3), 389–398 (1994) CrossRefGoogle Scholar
  4. 4.
    Culler, D., Karp, R., Patterson, D., Sahay, A., Schauser, K.E., Santos, E., Subramonian, R., von Eicken, T.: LogP: Towards a realistic model of parallel computation. In: Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming, pp. 1–12. ACM Press, New York (1993) CrossRefGoogle Scholar
  5. 5.
    Alexandrov, A., Ionescu, M.F., Schauser, K.E., Scheiman, C.: LogGP: Incorporating long messages into the LogP model. In: Proceedings of the seventh annual ACM symposium on Parallel algorithms and architectures, pp. 95–105. ACM Press, New York (1995) CrossRefGoogle Scholar
  6. 6.
    Kielmann, T., Bal, H., Verstoep, K.: Fast measurement of LogP parameters for message passing platforms. In: Rolim, J.D.P. (ed.) IPDPS Workshops, Cancun, Mexico. Lecture Notes in Computer Science, vol. 1800, pp. 1176–1183. Springer-Verlag, London (2000) Google Scholar
  7. 7.
    Culler, D., Liu, L.T., Martin, R.P., Yoshikawa, C.: Assessing fast network interfaces. IEEE Micro 16, 35–43 (1996) CrossRefGoogle Scholar
  8. 8.
    Fagg, G.E., Gabriel, E., Chen, Z., Angskun, T., Bosilca, G., Bukovsky, A., Dongarra, J.J.: Fault tolerant communication library and applications for high performance computing. In: LACSI Symposium, 2003 Google Scholar
  9. 9.
    Grama, A., Gupta, A., Karypis, G., Kumar, V.: Introduction to Parallel Computing, second edn. Pearson Education Limited, Addison-Wesley Logman, Boston (2003) Google Scholar
  10. 10.
    Thakur, R., Gropp, W.: Improving the performance of collective operations in MPICH. In: Dongarra, J., Laforenza, D., Orlando, S. (eds.) Recent Advances in Parallel Virtual Machine and Message Passing Interface. LNCS, vol. 2840, pp. 257–267. Springer Verlag, ??? (2003), 10th European PVM/MPI User’s Group Meeting, Venice, Italy Google Scholar
  11. 11.
    Chan, E.W., Heimlich, M.F., Purkayastha, A., van de Geijn, R.M.: On optimizing of collective communication. In: Cluster. (2004) Google Scholar
  12. 12.
    Rabenseifner, R., Träff, J.L.: More efficient reduction algorithms for non-power-of-two number of processors in message-passing parallel systems. In: Proceedings of EuroPVM/MPI. Lecture Notes in Computer Science. Springer-Verlag, Berlin (2004) Google Scholar
  13. 13.
    Kielmann, T., Hofman, R.F.H., Bal, H.E., Plaat, A., Bhoedjang, R.A.F.: MagPIe: MPI’s collective communication operations for clustered wide area systems. In: Proceedings of the seventh ACM SIGPLAN symposium on Principles and practice of parallel programming, pp. 131–140. ACM, New York (1999) CrossRefGoogle Scholar
  14. 14.
    Barchet-Estefanel, L.A., Mounié, G.: Fast tuning of intra-cluster collective communications. In: Proceedings, 11th European PVM/MPI Users’ Group Meeting, Budapest, Hungary, 2004, pp. 28–35 Google Scholar
  15. 15.
    Bell, C., Bonachea, D., Cote, Y., Duell, J., Hargrove, P., Husbands, P., Iancu, C., Welcome, M., Yelick, K.: An evaluation of current high-performance networks. In: Proceedings of the 17th International Symposium on Parallel and Distributed Processing, p. 28.1. IEEE Computer Society, Washington (2003) Google Scholar
  16. 16.
    Bernaschi, M., Iannello, G., Lauria, M.: Efficient implementation of reduce-scatter in MPI. J. Syst. Archit. 49(3), 89–108 (2003) CrossRefGoogle Scholar
  17. 17.
    Bruck, J., Ho, C.T., Kipnis, S., Upfal, E., Weathersby, D.: Efficient algorithms for all-to-all communications in multiport message-passing systems. IEEE Trans. Parallel Distributed Syst. 8(11), 1143–1156 (1997) CrossRefGoogle Scholar
  18. 18.
    Kielmann, T., Bal, H.E., Gorlatch, S., Verstoep, K., Hofman, R.F.: Network performance-aware collective communication for clustered wide-area systems. Parallel Comput. 27(11), 1431–1456 (2001) zbMATHCrossRefGoogle Scholar
  19. 19.
    Gropp, W., Lusk, E., Doss, N., Skjellum, A.: A high-performance, portable implementation of the MPI message passing interface standard. Parallel Comput. 22(6), 789–828 (1996) zbMATHCrossRefGoogle Scholar
  20. 20.
    Gropp, W., Lusk, E.L.: Reproducible measurements of MPI performance characteristics. In: Proceedings of the 6th European PVM/MPI Users’ Group Meeting on Recent Advances in PVM and MPI, pp. 11–18. Springer-Verlag, London (1999) Google Scholar
  21. 21.
    Gabriel, E., Fagg, G.E., Bosilca, G., Angskun, T., Dongarra, J.J., Squyres, J.M., Sahay, V., Kambadur, P., Barrett, B., Lumsdaine, A., Castain, R.H., Daniel, D.J., Graham, R.L., Woodall, T.S.: Open MPI: Goals, concept, and design of a next generation MPI implementation. In: Proceedings, 11th European PVM/MPI Users’ Group Meeting, Budapest, Hungary, 2004, pp. 97–104 Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2007

Authors and Affiliations

  • Jelena Pješivac-Grbović
    • 1
    Email author
  • Thara Angskun
    • 1
  • George Bosilca
    • 1
  • Graham E. Fagg
    • 1
  • Edgar Gabriel
    • 2
  • Jack J. Dongarra
    • 1
  1. 1.Innovative Computing Laboratory, Computer Science DepartmentUniversity of TennesseeKnoxvilleUSA
  2. 2.Department of Computer ScienceUniversity of HoustonHoustonUSA

Personalised recommendations