Transparent Neutral Element Elimination in MPI Reduction Operations

  • Jesper Larsson Träff
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6305)

Abstract

We describe simple and easy to implement MPI library internal functionality that enables MPI reduction operations to be performed more efficiently with increasing sparsity (fraction of neutral elements for the given operator) of the input (and intermediate result) vectors. Using this functionality we give an implementation of the MPI_Reduce collective operation that completely transparently to the application programmer exploits sparsity of both input and intermediate result vectors. Experiments carried out on a 64-core Intel Nehalem multi-core cluster with InfiniBand interconnect show considerable and worthwhile improvements as the sparsity of the input grows, about a factor of three with 1% non-zero elements which is close to best possible for the approach. The overhead incurred for dense vectors is negligible when compared to the same implementation not exploiting sparsity of input and intermediate results. The implemented SPS_Reduce function is for both very small and large vectors faster than the native MPI_Reduce of the used MPI library, indicating that the improvements reported are not artifacts of suboptimal reduction algorithms.

Keywords

Reduction Algorithm Neutral Element Effective Bandwidth Message Size Local Reduction 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Alpern, B., Carter, L.: Message compression for high performance. In: SIAM Conference on Parallel Processing for Scientific Computing (PPSC), pp. 814–819 (1995)Google Scholar
  2. 2.
    Filgueira, R., Singh, D.E., Calderón, A., Carretero, J.: CoMPI: Enhancing MPI based applications performance and scalability using run-time compression. In: Ropo, M., Westerholm, J., Dongarra, J. (eds.) Recent Advances in Parallel Virtual Machine and Message Passing Interface. LNCS, vol. 5759, pp. 207–218. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  3. 3.
    Gropp, W., Lusk, E.: Reproducible measurements of MPI performance characteristics. In: Margalef, T., Dongarra, J., Luque, E. (eds.) PVM/MPI 1999. LNCS, vol. 1697, pp. 11–18. Springer, Heidelberg (1999)CrossRefGoogle Scholar
  4. 4.
    Hoefler, T., Siebert, C., Lumsdaine, A.: Scalable communication protocols for dynamic sparse data exchange. In: 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP), pp. 159–168 (2010)Google Scholar
  5. 5.
    Hofmann, M., Rünger, G.: MPI reduction operations for sparse floating-point data. In: Lastovetsky, A., Kechadi, T., Dongarra, J. (eds.) EuroPVM/MPI 2008. LNCS, vol. 5205, pp. 94–101. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  6. 6.
    Ke, J., Burtscher, M., Speight, W.E.: Runtime compression of MPI messanes to improve the performance and scalability of parallel applications. In: ACM/IEEE Supercomputing, p. 59 (2004)Google Scholar
  7. 7.
    Lee, H.-J., Park, K.-L., Koh, K.-W., Kwon, O.-Y., Park, H.-W., Kim, S.-D.: Improving the performance of grid-enabled MPI by intelligent message compression. In: International Conference on Internet Computing, pp. 772–780 (2003)Google Scholar
  8. 8.
    MPI Forum. MPI: A Message-Passing Interface Standard. Version 2.2 (September 4, 2009), www.mpi-forum.org
  9. 9.
    Rabenseifner, R., Träff, J.L.: More efficient reduction algorithms for message-passing parallel systems. In: Kranzlmüller, D., Kacsuk, P., Dongarra, J. (eds.) EuroPVM/MPI 2004. LNCS, vol. 3241, pp. 36–46. Springer, Heidelberg (2004)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Jesper Larsson Träff
    • 1
  1. 1.Department of Scientific ComputingUniversity of ViennaViennaAustria

Personalised recommendations