OSL: Optimized Bulk Synchronous Parallel Skeletons on Distributed Arrays

  • Noman Javed
  • Frédéric Loulergue
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5737)

Abstract

The existing solutions to program parallel architectures range from parallelizing compilers to distributed concurrent programming. Intermediate approaches propose a more structured parallelism: Algorithmic skeletons are higher-order functions that capture the patterns of parallel algorithms. The user of the library has just to compose some of the skeletons to write her parallel application. When one is designing a parallel program, the parallel performance is important. It is thus very interesting for the programmer to rely on a simple yet realistic parallel performance model such as the Bulk Synchronous Parallel (BSP) model. We present OSL, the Orléans Skeleton Library: it is a library of BSP algorithmic skeletons in C++. It offers data-parallel skeletons on arrays as well as communication oriented skeletons. The performance of OSL is demonstrated with two applications: heat equation and FFT.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    OpenMP Application Program Interface version 3.0 (May 2008)Google Scholar
  2. 2.
    Aldinucci, M., Danelutto, M., Teti, P.: An Advanced Environment Supporting Structured Parallel Programming in Java. Future Generation Computer Systems 19, 611–626 (2002)CrossRefMATHGoogle Scholar
  3. 3.
    Apt, K.R., Olderog, E.-R.: Verification of sequential and concurrent programs, 2nd edn. Springer, Heidelberg (1997)CrossRefMATHGoogle Scholar
  4. 4.
    Bamha, M., Exbrayat, M.: Pipelining a Skew-Insensitive Parallel Join Algorithm. Parallel Processing Letters 13(3), 317–328 (2003)MathSciNetCrossRefGoogle Scholar
  5. 5.
    Benoit, A., Murray, C., Gilmore, S., Hillston, J.: Flexible Skeletal Programming with eSkel. In: Cunha, J.C., Medeiros, P.D. (eds.) Euro-Par 2005. LNCS, vol. 3648, pp. 761–770. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  6. 6.
    Bischof, H., Gorlatch, S., Leschinskiy, R.: DatTeL: A Data-Parallel C++ Template Library. Parallel Processing Letters 13(3), 461–472 (2003)MathSciNetCrossRefGoogle Scholar
  7. 7.
    Bisseling, R.: Parallel Scientific Computation. A structured approach using BSP and MPI. Oxford University Press, Oxford (2004)MATHGoogle Scholar
  8. 8.
    Bonorden, O., Juurlink, B., von Otte, I., Rieping, I.: The Paderborn University BSP (PUB) Library. Parallel Computing 29(2), 187–207 (2003)CrossRefGoogle Scholar
  9. 9.
    Braud, A., Vrain, C.: A parallel genetic algorithm based on the BSP model. In: Evolutionary Computation and Parallel Processing GECCO & AAAI Workshop, Orlando (Florida), USA (1999)Google Scholar
  10. 10.
    Caromel, D., Leyton, M.: Fine tuning algorithmic skeletons. In: Kermarrec, A.-M., Bougé, L., Priol, T. (eds.) Euro-Par 2007. LNCS, vol. 4641, pp. 72–81. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  11. 11.
    Chapman, B., Jost, G., van Der Pas, R.: Using OpenMP. MIT Press, Cambridge (2008); about OpenMP 2.5 Google Scholar
  12. 12.
    Ciechanowicz, P., Poldner, M., Kuchen, H.: The Münster Skeleton Library Muesli – A Comprenhensive Overview. Technical Report Working Paper No. 7, European Research Center for Information Systems, University of Münster, Germany (2009)Google Scholar
  13. 13.
    Cole, M.: Algorithmic Skeletons: Structured Management of Parallel Computation. MIT Press, Cambridge (1989) MATHGoogle Scholar
  14. 14.
    Cole, M.: Bringing Skeletons out of the Closet: A Pragmatic Manifesto for Skeletal Parallel Programming. Parallel Computing 30(3), 389–406 (2004)CrossRefGoogle Scholar
  15. 15.
    Skillicorn, D.B., Hill, J.M.D., McColl, W.F.: Questions and Answers about BSP. Scientific Programming 6(3), 249–274 (1997)CrossRefGoogle Scholar
  16. 16.
    Dabrowski, F., Loulergue, F.: Functional Bulk Synchronous Programming in C++. In: 21st IASTED International Multi-conference, Applied Informatics (AI 2003), Symposium on Parallel and Distributed Computing and Networks, February 2003, pp. 462–467. ACTA Press (2003)Google Scholar
  17. 17.
    Danelutto, M., Dazzi, P.: Joint Structured/Unstructured Parallelism Exploitation in Muskel. In: Alexandrov, V.N., van Albada, G.D., Sloot, P.M.A., Dongarra, J. (eds.) ICCS 2006. LNCS, vol. 3992, pp. 937–944. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  18. 18.
    Darlington, J., Field, A.J., Harrison, P.G., Kelly, P., Sharp, D., Wu, Q., While, R.: Parallel Programming Using Skeleton Functions. In: Reeve, M., Bode, A., Wolf, G. (eds.) PARLE 1993. LNCS, vol. 694, pp. 146–160. Springer, Heidelberg (1993)CrossRefGoogle Scholar
  19. 19.
    Dehne, F., Fabri, A., Rau-Chaplin, A.: Scalable parallel ceometric algorithms for coarse grained multicomputer. In: 9th Symposium on Computational Geometry, pp. 298–307 (1993)Google Scholar
  20. 20.
    Dracopoulos, D.C., Kent, S.: Speeding up genetic programming: A parallel BSP implementation. In: First Annual Conference on Genetic Programming. MIT Press, Cambridge (1996)Google Scholar
  21. 21.
    Emoto, K., Matsuzaki, K., Hu, Z., Takeichi, M.: Domain-Specific Optimization Strategy for Skeleton Programs. In: Kermarrec, A.-M., Bougé, L., Priol, T. (eds.) Euro-Par 2007. LNCS, vol. 4641, pp. 705–714. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  22. 22.
    Falcou, J., Sérot, J.: Formal Semantics Applied to the Implementation of a Skeleton-Based Parallel Programming Library. In: Bischof, C.H., Bücker, H.M., Gibbon, P., Joubert, G.R., Lippert, T., Mohr, B., Peters, F.J. (eds.) Parallel Computing: Architectures, Algorithms and Applications, ParCo 2007. Advances in Parallel Computing, vol. 15, pp. 243–252. IOS Press, Amsterdam (2007)Google Scholar
  23. 23.
    Falcou, J., Sérot, J., Chateau, T., Lapresté, J.-T.: Quaff: Efficient C++ Design for Parallel Skeletons. Parallel Computing 32, 604–615 (2006)CrossRefGoogle Scholar
  24. 24.
    Gava, F.: Formal Proofs of Functional BSP Programs. Parallel Processing Letters 13(3), 365–376 (2003)MathSciNetCrossRefGoogle Scholar
  25. 25.
    Ghuloum, A., Smith, T., Gansha, W., Zhou, X., Fang, J., Guo, P., So, B., Rajagopalan, M., Chen, Y., Chen, B.: Future-Proof Data Parallel Algorithms and Software on Intel Multi-Core Architecture. Intel Technology Journal 11(4) (2007)Google Scholar
  26. 26.
    Granvilliers, L., Hains, G., Miller, Q., Romero, N.: A system for the high-level parallelization and cooperation of constraint solvers. In: Pan, Y., Akl, S.G., Li, K. (eds.) Proceedings of International Conference on Parallel and Distributed Computing and Systems (PDCS), Las Vegas, USA, pp. 596–601. IASTED/ACTA Press (1998)Google Scholar
  27. 27.
    Gu, Y., Lee, B.-S., Cai, W.: JBSP: A BSP Programming Library in Java. Journal of Parallel and Distributed Computing 61(17), 1126–1142 (2001)CrossRefMATHGoogle Scholar
  28. 28.
    Hill, J.M.D., McColl, B., Stefanescu, D., Goudreau, M., et al.: BSPlib: The BSP Programming Library. Parallel Computing 24, 1947–1980 (1998)CrossRefGoogle Scholar
  29. 29.
    Hill, J.M.D., Skillicorn, D.B.: Practical Barrier Synchronisation. In: 6th EuroMicro Workshop on Parallel and Distributed Processing (PDP 1998). IEEE Computer Society Press, Los Alamitos (1998)Google Scholar
  30. 30.
    Hinsen, K., Langtangen, H.P., Skavhaug, O., Odegård, Å.: Using BSP and Python to simplify parallel programming. Future Generation Computur Systems 22(1), 123–157 (2006)CrossRefGoogle Scholar
  31. 31.
    Jifeng, H., Miller, Q., Chen, L.: Algebraic laws for BSP programming. In: Fraigniaud, P., Mignotte, A., Robert, Y., Bougé, L. (eds.) Euro-Par 1996. LNCS, vol. 1124, pp. 1123–1124. Springer, Heidelberg (1996)Google Scholar
  32. 32.
    Kessler, C.W.: Managing Distributed Shared Arrays in a Bulk-Synchronous Parallel Environment. Concurrency and Computation: Practice and Experience 16, 133–153 (2004)CrossRefGoogle Scholar
  33. 33.
    Kuchen, H.: A Skeleton Library. In: Monien, B., Feldmann, R.L. (eds.) Euro-Par 2002. LNCS, vol. 2400, pp. 620–629. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  34. 34.
    Kuchen, H., Cole, M.: The Integration of Task and Data Parallel Skeletons. Parallel Processing Letters 12(2), 141–155 (2002)CrossRefGoogle Scholar
  35. 35.
    Kuchen, H., Poldner, M.: On Implementing the Farm Skeleton. Parallel Processing Letters 18(1), 204–219 (2008)MathSciNetGoogle Scholar
  36. 36.
    Loulergue, F., Gava, F., Billiet, D.: Bulk Synchronous Parallel ML: Modular Implementation and Performance Prediction. In: Sunderam, V.S., van Albada, G.D., Sloot, P.M.A., Dongarra, J. (eds.) ICCS 2005. LNCS, vol. 3515, pp. 1046–1054. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  37. 37.
    Matsuzaki, K., Iwasaki, H., Emoto, K., Hu, Z.: A Library of Constructive Skeletons for Sequential Style of Parallel Programming. In: InfoScale 2006: Proceedings of the 1st international conference on Scalable information systems. ACM Press, New York (2006)Google Scholar
  38. 38.
    McColl, W.F.: Scalability, portability and predictability: The BSP approach to parallel programming. Future Generation Computer Systems 12, 265–272 (1996)CrossRefGoogle Scholar
  39. 39.
    Merlin, A., Hains, G.: A bulk synchronous process algebra. Computer Languages, Systems and Structures 33(3-4), 111–133 (2007)CrossRefMATHGoogle Scholar
  40. 40.
    Nichols, B., Buttlar, D., Proulx Farrell, J.: Pthreads Programming: A POSIX Standard for Better Multiprocessing. O’Reilly, Sebastopol (1996)Google Scholar
  41. 41.
    Pelagatti, S.: Structured Development of Parallel Programs. Taylor & Francis, Abington (1998)Google Scholar
  42. 42.
    Pervez, S., Gopalakrishnan, G., Kirby, R.M., Palmer, R., Thakur, R., Gropp, W.: Practical Model-Checking Method for Verifying Correctness of MPI Programs. In: Cappello, F., Herault, T., Dongarra, J. (eds.) PVM/MPI 2007. LNCS, vol. 4757, pp. 344–353. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  43. 43.
    Reinders, J.: Intel Threading Building Blocks: Outfitting C++ for Multi-core Processor Parallelism. O’Reilly, Sebastopol (2007)Google Scholar
  44. 44.
    Rogers, R.O., Skillicorn, D.B.: Using the BSP cost model to optimise parallel neural network training. Future Generation Computer Systems 14(5-6), 409–424 (1998)CrossRefGoogle Scholar
  45. 45.
    Siegel, S.F.: Model Checking Nonblocking MPI Programs. In: Cook, B., Podelski, A. (eds.) VMCAI 2007. LNCS, vol. 4349, pp. 44–58. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  46. 46.
    Snir, M., Gropp, W.: MPI the Complete Reference. MIT Press, Cambridge (1998)Google Scholar
  47. 47.
    Suijlen, W.J.: BSPonMPI, http://bsponmpi.sourceforge.net
  48. 48.
    Valiant, L.G.: A bridging model for parallel computation. Comm. of the ACM 33(8), 103 (1990)CrossRefGoogle Scholar
  49. 49.
    Veldhuizen, T.: Techniques for Scientific C++. Computer science technical report 542, Indiana University (2000)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Noman Javed
    • 1
  • Frédéric Loulergue
    • 1
  1. 1.Université d’Orléans – LIFOFrance

Personalised recommendations