International Journal of Parallel Programming

, Volume 42, Issue 6, pp 968–987 | Cite as

A Scalable Farm Skeleton for Hybrid Parallel and Distributed Programming

Article

Abstract

Multi-core processors and clusters of multi-core processors are ubiquitous. They provide scalable performance yet introducing complex and low-level programming models for shared and distributed memory programming. Thus, fully exploiting the potential of shared and distributed memory parallelization can be a tedious and error-prone task: programmers must take care of low-level threading and communication (e.g. message passing) details. In order to assist programmers in developing performant and reliable parallel applications Algorithmic Skeletons have been proposed. They encapsulate well-defined, frequently recurring parallel and distributed programming patterns, thus shielding programmers from low-level aspects of parallel and distributed programming. In this paper we take on the design and implementation of the well-known Farm skeleton. In order to address the hybrid architecture of multi-core clusters we present a two-tier implementation built on top of MPI and OpenMP. On the basis of three benchmark applications, including a simple ray tracer, an interacting particles system, and an application for calculating the Mandelbrot set, we illustrate the advantages of both skeletal programming in general and this two-tier approach in particular.

Keywords

High-level parallel programming Algorithmic skeletons   Farm skeleton Shared/distributed memory 

References

  1. 1.
    Cole, M.: Algorithmic Skeletons: Structured Management of Parallel Computation. MIT Press, Cambridge (1989)MATHGoogle Scholar
  2. 2.
    Cole, M.: Bringing skeletons out of the closet: a pragmatic manifesto for skeletal parallel programming. Parallel Comput. 30(3), 389–406 (2004)CrossRefGoogle Scholar
  3. 3.
    Kuchen, H.: A skeleton library. In: Proceedings of the 8th International Euro-Par Conference on Parallel Processing, Euro-Par ’02, London, UK, pp. 620–629. Springer (2002)Google Scholar
  4. 4.
    Benoit, A., Cole, B., Gilmore, S., Hillston, J.: Flexible skeletal programming with eskel. In: Proceedings of the 11th International Euro-Par Conference, vol. 3648, pp. 761–770. Springer (2005)Google Scholar
  5. 5.
    Matsuzaki, K., Iwasaki, H., Emoto, K., Hu, Z.: A library of constructive skeletons for sequential style of parallel programming. In: Proceedings of the 1st international conference on Scalable information systems, p. 13. ACM (2006)Google Scholar
  6. 6.
    Leyton, M., Piquer, J.: Skandium: multi-core programming with algorithmic skeletons. Euro-micro PDP 2010 (2010)Google Scholar
  7. 7.
    Nvidia Corp.: NVIDIA CUDA C Programming Guide 5.0. Nvidia Corporation (2012)Google Scholar
  8. 8.
    OpenCL Working Group: The OpenCL specification, Version 1.2 (2011)Google Scholar
  9. 9.
    Gropp, W., Lusk, W., Skjellum, A.: Using MPI: Portable Parallel Programming with the Message-Passing Interface. MIT Press, Cambridge (1996)Google Scholar
  10. 10.
    Chapman, B., Jost, G., van der Pas, R.: Using OpenMP: Portable Shared Memory Parallel Programming. Scientific and Engineering Computation. MIT Press, Cambridge (2008)Google Scholar
  11. 11.
    Ernsting, S., Kuchen, H.: Algorithmic skeletons for multi-core, multi-gpu systems and clusters. Int. J/ High Perform. Comput. Netw. 7(2), 129–138 (2012)CrossRefGoogle Scholar
  12. 12.
    Poldner, M., Kuchen, H.: Skeletons for divide and conquer algorithms. In: Proceedings of the IASTED International Conference on Parallel and Distributed Computing and Networks (PDCN). ACTA Press (2008)Google Scholar
  13. 13.
    Poldner, M., Kuchen, H.: Algorithmic skeletons for branch and bound. In: Proceedings of the 1st International Conference on Software and Data Technology (ICSOFT), vol. 1, pp. 291–300 (2006)Google Scholar
  14. 14.
    Ciechanowicz, P., Kuchen, H.: Enhancing muesli’s data parallel skeletons for multi-core computer architectures. In: 12th IEEE International Conference on High Performance Computing and Communications (HPCC), pp. 108–113. IEEE (2010)Google Scholar
  15. 15.
    Ciechanowicz, P.: Algorithmic skeletons for general sparse matrices on multi-Core processors. In: Proceedings of the 20th IASTED International Conference on Parallel and Distributed Computing and Systems (PDCS), pp. 188–197 (2008)Google Scholar
  16. 16.
    Kuchen, H., Striegnitz, J.: Higher-order functions and partial applications for a c++ skeleton library. In: Proceedings of the 2002 Joint ACM-ISCOPE Conference on Java Grande, pp. 122–130. ACM (2002)Google Scholar
  17. 17.
    Poldner, M., Kuchen, H.: On implementing the farm skeleton. Parallel Process. Lett. 18(1), 117–131 (2008)CrossRefMathSciNetGoogle Scholar
  18. 18.
    Appel, A.: Some techniques for shading machine renderings of solids. In: Proceedings of the April 30-May 2, 1968, Spring Joint Computer Conference, pp. 37–45. ACM (1968)Google Scholar
  19. 19.
    Farnsworth, M.: Basic ray tracer. http://renderspud.blogspot.de/2012/04/basic-ray-tracer-stage-3.html. Online, accessed 2013–06-13
  20. 20.
    Danelutto, M., Pasqualetti, F., Pelagatti, S.: Skeletons for Data Parallelism in p3l. In: Lecture Notes in Computer Science, vol. 1300, pp. 619–628. Springer (1997)Google Scholar
  21. 21.
    Fernando, A.P.J., Sobral, J.: Jaskel: a java skeleton-based framework for structured cluster and grid computing. In: 6th IEEE International Symposium on Cluster Computing and the Grid, Vol. 5. IEEE Press (2006)Google Scholar

Copyright information

© Springer Science+Business Media New York 2013

Authors and Affiliations

  1. 1.University of MuensterMuensterGermany

Personalised recommendations