International Journal of Parallel Programming

, Volume 42, Issue 4, pp 619–642

MulticoreBSP for C: A High-Performance Library for Shared-Memory Parallel Programming

  • A. N. Yzelman
  • R. H. Bisseling
  • D. Roose
  • K. Meerbergen
Article

DOI: 10.1007/s10766-013-0262-9

Cite this article as:
Yzelman, A.N., Bisseling, R.H., Roose, D. et al. Int J Parallel Prog (2014) 42: 619. doi:10.1007/s10766-013-0262-9

Abstract

The bulk synchronous parallel (BSP) model, as well as parallel programming interfaces based on BSP, classically target distributed-memory parallel architectures. In earlier work, Yzelman and Bisseling designed a MulticoreBSP for Java library specifically for shared-memory architectures. In the present article, we further investigate this concept and introduce the new high-performance MulticoreBSP for C library. Among other features, this library supports nested BSP runs. We show that existing BSP software performs well regardless whether it runs on distributed-memory or shared-memory architectures, and show that applications in MulticoreBSP can attain high-performance results. The paper details implementing the Fast Fourier Transform and the sparse matrix–vector multiplication in BSP, both of which outperform state-of-the-art implementations written in other shared-memory parallel programming interfaces. We furthermore study the applicability of BSP when working on highly non-uniform memory access architectures.

Keywords

High-performance computing Bulk synchronous parallel   Shared-memory parallel programming Software library Fast Fourier transform Sparse matrix–vector multiplication 

Copyright information

© Springer Science+Business Media New York 2013

Authors and Affiliations

  • A. N. Yzelman
    • 1
    • 2
  • R. H. Bisseling
    • 3
  • D. Roose
    • 2
  • K. Meerbergen
    • 2
  1. 1.Flanders ExaScience Lab (Intel Labs Europe)HeverleeBelgium
  2. 2.Department of Computer ScienceKU LeuvenHeverleeBelgium
  3. 3.Department of MathematicsUtrecht UniversityUtrechtThe Netherlands