MPC and Coarray Fortran: Alternatives to Classic MPI Implementations on the Examples of Scalable Lattice Boltzmann Flow Solvers

  • Markus Wittmann
  • Georg Hager
  • Gerhard Wellein
  • Thomas Zeiser
  • Bettina Krammer
Conference paper


In recent years, more and more parallel programming concepts have emerged as alternatives or improvements to the well established MPI concept. Arguments for all the new parallel languages or alternative communication frameworks are typically the increasing number of cores in modern systems and the hierarchical memory structure in clusters of multi-socket multi-core compute nodes.


POSIX Thread Intel Compiler Intel Xeon X5650 Processor Asynchronous Data Transfer Application Performance Optimization 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



This work was financially supported through the framework of the Competence Network for Technical, Scientific High Performance Computing in Bavaria (KONWIHR) and by BMBF under grant No. 01IH08003A (project SKALB).

It was also partly conducted at the Exascale Computing Research Center (ECR), with support provided by CEA, GENCI, Intel, and UVSQ. The hospitality of the Exascale Computing Research Center at Université de Versailles St-Quentin-en-Yvelines while working on the MPC benchmarks is gratefully acknowledged by Markus Wittmann.

Special thanks go to Prof. William Jalby for enabling this research visit at ECR, and to Marc Tchiboukdjian and Sylvain Didelot for their kind help with MPC.

The Coarray Fortran tests have been done by Klaus Sembritzki as part of his Master Thesis, which was also carried out in cooperation with ECR at UVSQ.

Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the CEA, GENCI, Intel or UVSQ.


  1. 1.
    P. Carribault, M. Pérache, and H. Jourdren. Enabling low-overhead hybrid MPI / OpenMP parallelism with MPC. Beyond Loop Level Parallelism in OpenMP Accelerators Tasking and More, pages 1–14, 2010.Google Scholar
  2. 2.
    J. Dáz Martín, J. Rico Gallego, J. Álvarez Llorente, and J. Perogil Duque. An MPI-1 compliant thread-based implementation. In M. Ropo, J. Westerholm, and J. Dongarra, editors, Recent Advances in Parallel Virtual Machine and Message Passing Interface, volume 5759 of Lecture Notes in Computer Science, pages 327–328. Springer Berlin / Heidelberg, 2009. doi:10.1007/978-3-642-03770-2_42.Google Scholar
  3. 3.
    E. Demaine. A threads-only MPI implementation for the development of parallel programs. In Proceedings of the 11th International Symposium on High Performance Computing Systems, pages 153–163, 1997.Google Scholar
  4. 4.
    S. Didelot, P. Carribault, M. Pérache, and W. Jalby. Improving MPI communication overlap with collaborative polling. EuroMPI, 2012. to appear.Google Scholar
  5. 5.
    G. Hager and G. Wellein. Introduction to High Performance Computing for Scientists and Engineers. CRC Press, 2010.Google Scholar
  6. 6.
    M. Hasert, H. Klimach, and S. Roller. CAF versus MPI – applicability of CoArray Fortran to a flow solver. In Proceedings of the 18th European MPI Users’ Group conference on Recent advances in the message passing interface, EuroMPI’11, pages 228–236, Berlin, Heidelberg, 2011. Springer-Verlag.Google Scholar
  7. 7.
    Intel Corp. Intel MPI Benchmarks., 2012.
  8. 8.
    Media Engineering Group (GIM) at the University of Extremadura. AzequiaMPI., 2012.
  9. 9.
    MPC. MPC project web site, 2012.
  10. 10.
    M. Pérache, H. Jourdren, and R. Namyst. MPC: A unified parallel runtime for clusters of NUMA machines. In Proceedings of the 14th international Euro-Par conference on Parallel Processing, Euro-Par ’08, pages 78–88, Berlin, Heidelberg, 2008. Springer-Verlag. doi:10.1007/978-3-540-85451-7_9.Google Scholar
  11. 11.
    G. Schubert, H. Fehske, G. Hager, and G. Wellein. Hybrid-parallel sparse matrix-vector multiplication with explicit communication overlap on current multicore-based systems. Parallel Processing Letters, 21(03):339–358, 2011. doi:10.1142/S0129626411000254.MathSciNetCrossRefGoogle Scholar
  12. 12.
    K. Sembritzki. Evaluation of the CoArray Fortran programming model on the example of a lattice Boltzmann code. Master’s thesis, Erlangen Regional Computing Center, University of Erlangen-Nuremberg, 2012.
  13. 13.
    H. Tang, K. Shen, and T. Yang. Compile/run-time support for threaded MPI execution on multiprogrammed shared memory machines. SIGPLAN Not., 34(8):107–118, May 1999. doi:10.1145/329366.301114.CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Markus Wittmann
    • 1
  • Georg Hager
    • 1
  • Gerhard Wellein
    • 1
  • Thomas Zeiser
    • 1
  • Bettina Krammer
    • 2
  1. 1.Regionales Rechenzentrum Erlangen (RRZE)Universität Erlangen-NürnbergErlangenGermany
  2. 2.Exascale Computing Research CenterUniversity of Versailles Saint-Quentin-en-YvelinesVersaillesFrance

Personalised recommendations