Impact of Kernel-Assisted MPI Communication over Scientific Applications: CPMD and FFTW

  • Teng Ma
  • Aurelien Bouteiller
  • George Bosilca
  • Jack J. Dongarra
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6960)


Collective communication is one of the most powerful message passing concepts, enabling parallel applications to express complex communication patterns while allowing the underlying MPI to provide efficient implementations to minimize the cost of the data movements. However, with the increase in the heterogeneity inside the nodes, more specifically the memory hierarchies, harnessing the maximum compute capabilities becomes increasingly difficult. This paper investigates the impact of kernel-assisted MPI communication, over two scientific applications: 1) Car-Parrinello molecular dynamics(CPMD), a chemical molecular dynamics application, and 2) FFTW, a Discrete Fourier Transform (DFT). By focusing on the usage of Message Passing Interface (MPI), we found the communication characteristics and patterns of each application. Our experiments indicate that the quality of the collective communication implementation on a specific machine plays a critical role on the overall application performance.


Discrete Fourier Transform Shared Memory Message Passing Interface Collective Communication Collective Operation 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Buntinas, D., Goglin, B., Goodell, D., Mercier, G., Moreaud, S.: Cache-Efficient, Intranode Large-Message MPI Communication with MPICH2-Nemesis. In: Proceedings of the 38th International Conference on Parallel Processing (ICPP-2009), pp. 462–469. IEEE Computer Society Press, Vienna (2009)CrossRefGoogle Scholar
  2. 2.
    Moreaud, S., Goglin, B., Goodell, D., Namyst, R.: Optimizing MPI Communication within large Multicore nodes with Kernel assistance. In: CAC 2010: The 10th Workshop on Communication Architecture for Clusters, Held in Conjunction with IPDPS 2010. IEEE Computer Society Press, Atlanta (2010)Google Scholar
  3. 3.
    Hutter, J., Iannuzzi, M.: CPMD: parrinello Molecular Dynamics,
  4. 4.
    Frigo, M., Johnson, S.: The Design and Implementation of FFTW3. Proceedings of the IEEE 93(2), 216–231 (2005)CrossRefGoogle Scholar
  5. 5.
    Ma, T., Bosilca, G., Bouteiller, A., Dongarra, J.J.: Locality and topology aware intra-node communication among multicore CPUs. In: Keller, R., Gabriel, E., Resch, M., Dongarra, J. (eds.) EuroMPI 2010. LNCS, vol. 6305, pp. 265–274. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  6. 6.
    Ma, T., Bosilca, G., Bouteiller, A., Goglin, B., Squyres, J., Dongarra, J.: Kernel Assisted Collective Intra-node Communication Among Multicore and Manycore CPUs. Research report (2010)Google Scholar
  7. 7.
    Vetter, J.S., Mueller, F.: Communication characteristics of large-scale scientific applications for contemporary cluster architectures. J. Parallel Distrib. Comput. 63, 853–865 (2003)CrossRefzbMATHGoogle Scholar
  8. 8.
    Plaat, A., Bal, H.E., Hofman, R.F.H., Kielmann, T.: Sensitivity of parallel applications to large differences in bandwidth and latency in two-layer interconnects. Future Generation Computer Systems 17(6), 769–782 (2001)CrossRefzbMATHGoogle Scholar
  9. 9.
    Rabenseifner, R., Hager, G., Jost, G.: Hybrid MPI/OpenMP parallel programming on clusters of multi-core SMP nodes. In: Proceedings of the 2009 17th Euromicro International Conference on Parallel, Distributed and Network-based Processing, pp. 427–436. IEEE Computer Society Press, Washington, DC, USA (2009)Google Scholar
  10. 10. Finite difference method,
  11. 11.
    Vetter, J.S.: mpiP: Lightweight, Scalable MPI Profiling,
  12. 12.
    Fagg, G.E., Bosilca, G., Pješivac-Grbović, J., Angskun, T., Dongarra, J.: Tuned: A flexible high performance collective communication component developed for Open MPI. In: Proccedings of DAPSYS 2006, pp. 65–72. Springer, Innsbruck (2006)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Teng Ma
    • 1
  • Aurelien Bouteiller
    • 1
  • George Bosilca
    • 1
  • Jack J. Dongarra
    • 1
  1. 1.Innovative Computing LaboratoryEECS, University of TennesseeUSA

Personalised recommendations