Bulk: A Modern C++ Interface for Bulk-Synchronous Parallel Programs

  • Jan-Willem BuurlageEmail author
  • Tom Bannink
  • Rob H. Bisseling
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11014)


The bulk-synchronous parallel (BSP) programming model gives a powerful method for implementing and describing parallel programs. In this article we present Bulk, a novel interface for writing BSP programs in the C++ programming language that leverages modern C++ features to allow for the implementation of safe and generic parallel algorithms for shared-memory, distributed-memory, and hybrid systems. This interface targets the next generation of BSP programmers who want to write fast, safe, clear and portable parallel programs. We discuss two applications: regular sample sort and the fast Fourier transform, both in terms of performance, and ease of parallel implementation.


  1. 1.
    Valiant, L.G.: A bridging model for parallel computation. Comm. ACM 33(8), 103–111 (1990)CrossRefGoogle Scholar
  2. 2.
    Hill, J.M.D., et al.: BSPlib: the BSP programming library. Parallel Comput. 24(14), 1947–1980 (1998)CrossRefGoogle Scholar
  3. 3.
    Suijlen, W.: BSPonMPI v0.3.
  4. 4.
    Yzelman, A.N., Bisseling, R.H.: An object-oriented bulk synchronous parallel library for multicore programming. Concurr. Comput.: Pract. Exp. 24(5), 533–553 (2012)CrossRefGoogle Scholar
  5. 5.
    Yzelman, A.N., Bisseling, R.H., Roose, D., Meerbergen, K.: MulticoreBSP for C: a high-performance library for shared-memory parallel programming. Int. J. Parallel Programm. 42(4), 619–642 (2014)CrossRefGoogle Scholar
  6. 6.
    Bonorden, O., Juurlink, B., von Otte, I., Rieping, I.: The Paderborn University BSP (PUB) library. Parallel Comput. 29(2), 187–207 (2003)CrossRefGoogle Scholar
  7. 7.
    Loulergue, F., Gava, F., Billiet, D.: Bulk synchronous parallel ML: modular implementation and performance prediction. In: Sunderam, V.S., van Albada, G.D., Sloot, P.M.A., Dongarra, J.J. (eds.) ICCS 2005. LNCS, vol. 3515, pp. 1046–1054. Springer, Heidelberg (2005). Scholar
  8. 8.
    Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. In: Proceedings of OSDI, pp. 137–149 (2004)Google Scholar
  9. 9.
    Malewicz, G., et al.: Pregel: a system for large-scale graph processing. In: Proceedings of SIGMOD, pp. 135–145 (2010)Google Scholar
  10. 10.
    Ching, A., Edunov, S., Kabiljo, M., Logothetis, D., Muthukrishnan, S.: One trillion edges: graph processing at Facebook-scale. VLDB 8(12), 1804–1815 (2015)Google Scholar
  11. 11.
    Siddique, K., Akhtar, Z., Yoon, E.J., Jeong, Y.S., Dasgupta, D., Kim, Y.: Apache Hama: an emerging bulk synchronous parallel computing framework for big data applications. IEEE Access 4, 8879–8887 (2016)CrossRefGoogle Scholar
  12. 12.
    Heller, T., Diehl, P., Byerly, Z., Biddiscombe, J., Kaiser, H.: HPX-An open source C++ standard library for parallelism and concurrency. In: Proceedings of OpenSuCo, p. 5 (2017)Google Scholar
  13. 13.
    Zheng, Y., Kamil, A., Driscoll, M.B., Shan, H., Yelick, K.: UPC++: a PGAS extension for C++. In: Proceedings of IEEE IPDPS, pp. 1105–1114 (2014)Google Scholar
  14. 14.
    Hamidouche, K., Falcou, J., Etiemble, D.: Hybrid bulk synchronous parallelism library for clustered SMP architectures. In: Proceedings of HLPP, pp. 55–62 (2010)Google Scholar
  15. 15.
    Valiant, L.G.: A bridging model for multi-core computing. J. Comput. Syst. Sci. 77(1), 154–166 (2011)MathSciNetCrossRefGoogle Scholar
  16. 16.
    Keßler, C.W.: NestStep: nested parallelism and virtual shared memory for the BSP model. J. Supercomput. 17(3), 245–262 (2000)CrossRefGoogle Scholar
  17. 17.
    ISO/IEC: 14882:2017(E) - Programming languages - C++ (2017)Google Scholar
  18. 18.
    Numrich, R.W., Reid, J.: Co-array Fortran for parallel programming. ACM SIGPLAN Fortran Forum 17(2), 1–31 (1998)CrossRefGoogle Scholar
  19. 19.
    MPI Forum: MPI: a message-passing interface standard. Int. J. Supercomput. Appl. High-Perform. Comput. 8, 165–414 (1994)Google Scholar
  20. 20.
    Olofsson, A., Nordström, T., Ul-Abdin, Z.: Kickstarting high-performance energy-efficient manycore architectures with Epiphany. In: Proceedings of IEEE ACSSC, pp. 1719–1726 (2014)Google Scholar
  21. 21.
    Shi, H., Schaeffer, J.: Parallel sorting by regular sampling. J. Parallel Distrib. Comput. 14(4), 361–372 (1992)CrossRefGoogle Scholar
  22. 22.
    Hill, J.M.D., Donaldson, S.R., Skillicorn, D.B.: Portability of performance with the BSPLib communications library. In: Proceedings of MPPM, p. 33 (1997)Google Scholar
  23. 23.
    Gerbessiotis, A.V.: Extending the BSP model for multi-core and out-of-core computing: MBSP. Parallel Comput. 41(Suppl. C), 90–102 (2015)CrossRefGoogle Scholar
  24. 24.
    Inda, M.A., Bisseling, R.H.: A simple and efficient parallel FFT algorithm using the BSP model. Parallel Comput. 27(14), 1847–1878 (2001)MathSciNetCrossRefGoogle Scholar
  25. 25.
    Bisseling, R.H.: Parallel Scientific Computation: A Structured Approach using BSP and MPI. Oxford University Press, Oxford (2004)CrossRefGoogle Scholar
  26. 26.
    Frigo, M., Johnson, S.G.: FFTW: an adaptive software architecture for the FFT. In: Proceedings of IEEE ICASSP, pp. 1381–1384 (1998)Google Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Centrum Wiskunde & InformaticaAmsterdamThe Netherlands
  2. 2.QuSoftAmsterdamThe Netherlands
  3. 3.Mathematical InstituteUtrecht UniversityUtrechtThe Netherlands

Personalised recommendations