Advertisement

Wait-Free Message Passing Protocol for Non-coherent Shared Memory Architectures

  • Isaías A. Comprés Ureña
  • Michael Gerndt
  • Carsten Trinitis
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7490)

Abstract

The number of cores in future CPUs is expected to increase steadily. Balanced CPU designs scale hardware cache coherency functionality according to the number of cores, in order to minimize bottlenecks in parallel applications. An alternative approach is to do away with hardware coherence entirely; the Single-chip Cloud Computer (SCC), a 48 core experimental processor from Intel labs, does exactly that. A wait-free protocol for message passing on non-coherent buffers was introduced with the RCKMPI library, in order to support MPI on the SCC. In this work, the message passing performance of the protocol is modeled. Additionally, a port for symmetric multi-processors is introduced and used for comparison with MPICH2-Nemesis and Open MPI. Performance is analyzed based on statistics collected on a 4-dimensional space composed of source rank, target rank, message size and frequency.

Keywords

MPI message passing communication protocol non-coherent shared memory non-blocking wait-free 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Intel’s Many-core Applications Research Community, http://communities.intel.com/community/marc
  2. 2.
    KNEM: High-Performance Intra-Node MPI Communication, http://runtime.bordeaux.inria.fr/knem/
  3. 3.
    Leibniz-Rechenzentrum (LRZ): SuperMUC Petascale System, http://www.lrz.de/services/compute/supermuc/systemdescription/
  4. 4.
  5. 5.
    Ohio State University (OSU) Micro-Benchmarks, http://mvapich.cse.ohio-state.edu/benchmarks/
  6. 6.
  7. 7.
    Transregional Research Center InvasIC, http://www.invasic.de
  8. 8.
    Buntinas, D., Goglin, B., Goodell, D., Mercier, G., Moreaud, S.: Cache-Efficient, Intranode, Large-Message MPI Communication with MPICH2-Nemesis. In: Parallel Processing, ICPP 2009 (2009)Google Scholar
  9. 9.
    Chapman, K., Hussein, A., Hosking, A.L.: X10 on the Single-chip Cloud Computer: Porting and Preliminary Performance. In: Proceedings of the ACM SIGPLAN X10 Workshop (2011)Google Scholar
  10. 10.
    Christgau, S., Kiertscher, S., Schnor, B.: The Benefit of Topology Awareness of MPI Applications on the SCC. In: 3rd Many-core Applications Research Community (MARC) Symposium (2011)Google Scholar
  11. 11.
    Clauss, C., Lankes, S., Bemmerl, T.: Performance Tuning of SCC-MPICH by Means of the Proposed MPI-3.0 Tool Interface. In: Cotronis, Y., Danalis, A., Nikolopoulos, D.S., Dongarra, J. (eds.) EuroMPI 2011. LNCS, vol. 6960, pp. 318–320. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  12. 12.
    Clauss, C., Lankes, S., Reble, P., Bemmerl, T.: Recent Advances and Future Prospects in iRCCE and SCC-MPICH. In: 3rd Many-core Applications Research Community (MARC) Symposium (2011)Google Scholar
  13. 13.
    Comprés Ureña, I.A., Gerndt, M.: Improved RCKMPI’s SCCMPB Channel: Scaling and Dynamic Processes Support. In: 4th Many-core Applications Research Community (MARC) Symposium (2011)Google Scholar
  14. 14.
    Comprés Ureña, I.A., Riepen, M., Konow, M.: RCKMPI – Lightweight MPI Implementation for Intel’s Single-chip Cloud Computer (SCC). In: Cotronis, Y., Danalis, A., Nikolopoulos, D.S., Dongarra, J. (eds.) EuroMPI 2011. LNCS, vol. 6960, pp. 208–217. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  15. 15.
    Comprés Ureña, I.A., Riepen, M., Konow, M., Gerndt, M.: Invasive MPI on intel’s single-chip cloud computer. In: Proceedings of the 25th International Conference on Architecture of Computing Systems (2012)Google Scholar
  16. 16.
    Fuerlinger, K., Wright, N.J., Skinner, D.: Effective Performance Measurement at Petascale Using IPM. In: International Conference on Parallel and Distributed Systems, ICPADS (2010)Google Scholar
  17. 17.
    Held, J.: Single-chip Cloud Computer, an IA Tera-scale Research Processor. In: Guarracino, M.R., Vivien, F., Träff, J.L., Cannatoro, M., Danelutto, M., Hast, A., Perla, F., Knüpfer, A., Di Martino, B., Alexander, M. (eds.) Euro-Par-Workshop 2010. LNCS, vol. 6586, p. 85. Springer, Heidelberg (2011)Google Scholar
  18. 18.
    Mattson, T.G., Riepen, M., Lehnig, T., Brett, P., Haas, W., Kennedy, P., Howard, J., Vangal, S., Borkar, N., Ruhl, G., Dighe, S.: The 48-core SCC Processor: the Programmer’s View. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (2010)Google Scholar
  19. 19.
    Rotta, R.: On Efficient Message Passing on the Intel SCC. In: 3rd Many-core Applications Research Community (MARC) Symposium (2011)Google Scholar
  20. 20.
    Wong, F.C., Martin, R.P., Arpaci-Dusseau, R.H., Culler, D.E.: Architectural requirements and scalability of the nas parallel benchmarks. In: Proceedings of the Conference on Supercomputing (1999)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Isaías A. Comprés Ureña
    • 1
  • Michael Gerndt
    • 1
  • Carsten Trinitis
    • 1
  1. 1.Institute of InformaticsTechnical University of Munich (TUM)GarchingGermany

Personalised recommendations