Pattern-Independent Detection of Manual Collectives in MPI Programs

  • Alexandru Calotoiu
  • Christian Siebert
  • Felix Wolf
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7484)


In parallel applications, a significant amount of communication occurs in a collective fashion to perform, for example, broadcasts, reductions, or complete exchanges. Although the MPI standard defines many convenience functions for this purpose, which not only improve code readability and maintenance but are usually also highly efficient, many application programmers still create their own, manual implementations using point-to-point communication. We show how instances of such hand-crafted collectives can be automatically detected. Matching pre- and post-conditions of hashed message exchanges recorded in event traces, our method is independent of the specific communication pattern employed. We demonstrate that replacing detected broadcasts in the HPL benchmark can yield significant performance improvements.


MPI collective operations performance optimization HPL 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    HPL – A portable implementation of the high-performance Linpack benchmark for distributed-memory computers,
  2. 2.
    Bernaschi, M., Iannello, G., Lauria, M.: Efficient Implementation of Reduce-scatter in MPI. In: Proceedings. 10th Euromicro Workshop on Parallel, Distributed and Network-based Processing, pp. 301–308 (2002)Google Scholar
  3. 3.
    Di Martino, B., Mazzeo, A., Mazzocca, N., Villano, U.: Parallel program analysis and restructuring by detection of point-to-point interaction patterns and their transformation into collective communication constructs. Science of Computer Programming 40, 235–263 (2001)MATHCrossRefGoogle Scholar
  4. 4.
    Geimer, M., Wolf, F., Wylie, B.J.N., Ábrahám, E., Becker, D., Mohr, B.: The Scalasca performance toolset architecture. Concurrency and Computation: Practice and Experience 22(6), 702–719 (2010)Google Scholar
  5. 5.
    Gorlatch, S.: Send-Receive Considered Harmful: Myths and Realities of Message Passing. ACM Transactions on Programming Languages and Systems (TOPLAS) 26, 47–56 (2004)CrossRefGoogle Scholar
  6. 6.
    Hermanns, M.-A., Geimer, M., Wolf, F., Wylie, B.J.N.: Verifying causality between distant performance phenomena in large-scale MPI applications. In: Proc. of the 17th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP), Weimar, Germany, pp. 78–84. IEEE Computer Society (February 2009)Google Scholar
  7. 7.
    Hoefler, T., Siebert, C., Lumsdaine, A.: Group Operation Assembly Language - A Flexible Way to Express Collective Communication. In: The 38th International Conference on Parallel Processing. IEEE (September 2009)Google Scholar
  8. 8.
    Hoefler, T., Siebert, C., Rehm, W.: A practically constant-time MPI Broadcast Algorithm for large-scale InfiniBand Clusters with Multicast. In: Proceedings of the 21st IEEE International Parallel & Distributed Processing Symposium, pp. 1–8. IEEE Computer Society (March 2007)Google Scholar
  9. 9.
    Kumar, S., Sabharwal, Y., Garg, R., Heidelberger, P.: Optimization of All-to-All Communication on the Blue Gene/L Supercomputer. In: Proc. of the 37th International Conference on Parallel Processing, pp. 320–329. IEEE Computer Society, Washington, DC (2008)Google Scholar
  10. 10.
    Message Passing Interface Forum. MPI: A Message-Passing Interface Standard, Version 2.2. High Performance Computing Center Stuttgart, HLRS (2009)Google Scholar
  11. 11.
    Preissl, R., Schulz, M., Kranzlmuller, D., de Supinski, B.R., Quinlan, D.J.: Transforming MPI Source code based on communication patterns. Future Generation Computer Systems 26, 147–154 (2009)CrossRefGoogle Scholar
  12. 12.
    Ross, R., Latham, R., Gropp, W., Lusk, E., Thakur, R.: Processing MPI Datatypes Outside MPI. In: Ropo, M., Westerholm, J., Dongarra, J. (eds.) PVM/MPI. LNCS, vol. 5759, pp. 42–53. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  13. 13.
    Sanders, P., Träff, J.L.: Parallel Prefix (Scan) Algorithms for MPI. In: Mohr, B., Träff, J.L., Worringen, J., Dongarra, J. (eds.) PVM/MPI 2006. LNCS, vol. 4192, pp. 49–57. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  14. 14.
    Träff, J.L., Ripke, A., Siebert, C., Balaji, P., Thakur, R., Gropp, W.: A Pipelined Algorithm for Large, Irregular All-Gather Problems. International Journal of High Performance Compututing Applications 24, 58–68 (2010)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Alexandru Calotoiu
    • 1
    • 2
  • Christian Siebert
    • 1
    • 2
  • Felix Wolf
    • 1
    • 2
    • 3
  1. 1.German Research School for Simulation SciencesAachenGermany
  2. 2.Computer Science DepartmentRWTH Aachen UniversityAachenGermany
  3. 3.Forschungszentrum Jülich, Jülich Supercomputing CentreJülichGermany

Personalised recommendations