Implementing Fusion-Equipped Parallel Skeletons by Expression Templates

  • Kiminori Matsuzaki
  • Kento Emoto
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6041)


Developing efficient parallel programs is more difficult and complicated than developing sequential ones. Skeletal parallelism is a promising methodology for easy parallel programming in which users develop parallel programs by composing ready-made components called parallel skeletons. We developed a parallel skeleton library SkeTo that provides parallel skeletons implemented in C++ and MPI for distributed-memory environments. In the new version of the library, the implementation of the parallel skeletons for lists is improved so that the skeletons equip themselves with fusion optimization. The optimization mechanism is implemented based on the programming technique called expression templates. In this paper, we illustrate the improved design and implementation of parallel skeletons for lists in the SkeTo library.


Skeletal parallelism fusion transformation list skeletons expression templates template meta-programming 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Cole, M.: Algorithmic Skeletons: Structural Management of Parallel Computation. Research Monographs in Parallel and Distributed Computing. MIT Press, Cambridge (1989)zbMATHGoogle Scholar
  2. 2.
    Hu, Z., Iwasaki, H., Takeichi, M.: An accumulative parallel skeleton for all. In: Le Métayer, D. (ed.) ESOP 2002. LNCS, vol. 2305, pp. 83–97. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  3. 3.
    Bird, R.S.: An introduction to the theory of lists. In: Logic of Programming and Calculi of Discrete Design. NATO ASI Series F, vol. 36, pp. 5–42. Springer, Heidelberg (1987)CrossRefGoogle Scholar
  4. 4.
    Matsuzaki, K., Iwasaki, H., Emoto, K., Hu, Z.: A library of constructive skeletons for sequential style of parallel programming. In: InfoScale 2006: Proceedings of the 1st international conference on Scalable information systems. ACM International Conference Proceeding Series, vol. 152. ACM Press, New York (2006)Google Scholar
  5. 5.
    Matsuzaki, K., Kakehi, K., Iwasaki, H., Hu, Z., Akashi, Y.: A fusion-embedded skeleton library. In: Danelutto, M., Vanneschi, M., Laforenza, D. (eds.) Euro-Par 2004. LNCS, vol. 3149, pp. 644–653. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  6. 6.
    Chiba, S.: A metaobject protocol for C++. In: Proceedings of OOPSLA 1995, Tenth Annual Conference on Object-Oriented Programming Systems, Languages, and Applications. SIGPLAN Notices, vol. 30, pp. 285–299. ACM Press, New York (1995)CrossRefGoogle Scholar
  7. 7.
    Veldhuizen, T.L.: Expression templates. C++ Report 7(5), 26–31 (1995); Reprinted in Lippman, S. (ed.): C++ GemsGoogle Scholar
  8. 8.
    Aldinucci, M., Gorlatch, S., Lengauer, C., Pelagatti, S.: Towards parallel programming by transformation: the FAN skeleton framework. Parallel Algorithms and Applications 16(2-3), 87–121 (2001)MathSciNetCrossRefzbMATHGoogle Scholar
  9. 9.
    Emoto, K., Matsuzaki, K., Hu, Z., Takeichi, M.: Domain-specific optimization strategy for skeleton programs. In: Kermarrec, A.-M., Bougé, L., Priol, T. (eds.) Euro-Par 2007. LNCS, vol. 4641, pp. 705–714. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  10. 10.
    Hu, Z., Takeichi, M., Iwasaki, H.: Diffusion: Calculating efficient parallel programs. In: Proceedings of the 1999 ACM SIGPLAN Workshop on Partial Evaluation and Semantics-Based Program Manipulation (1999)Google Scholar
  11. 11.
    Veldhuizen, T.L.: Arrays in Blitz++. In: Caromel, D., Oldehoeft, R.R., Tholburn, M. (eds.) ISCOPE 1998. LNCS, vol. 1505, pp. 223–230. Springer, Heidelberg (1998)CrossRefGoogle Scholar
  12. 12.
    Singler, J., Sanders, P., Putze, F.: The multi-core standard template library. In: Kermarrec, A.-M., Bougé, L., Priol, T. (eds.) Euro-Par 2007. LNCS, vol. 4641, pp. 682–694. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  13. 13.
    Kise, K., Katagiri, T., Honda, H., Yuba, T.: Solving the 24-queens problem using MPI on a PC cluster. Technical Report UEC-IS-2004-6, Graduate School of Information Systems, The University of Electro-Communications (2004)Google Scholar
  14. 14.
    Klusik, U., Loogen, R., Priebe, S., Rubio, F.: Implementation skeletons in Eden: Low-effort parallel programming. In: Mohnen, M., Koopman, P. (eds.) IFL 2000. LNCS, vol. 2011, pp. 71–88. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  15. 15.
    Hammond, K., Berthold, J., Loogen, R.: Automatic skeletons in Template Haskell. Parallel Processing Letters 13(3), 413–424 (2003)MathSciNetCrossRefGoogle Scholar
  16. 16.
    Scaife, N., Horiguchi, S., Michaelson, G., Bristow, P.: A parallel SML compiler based on algorithmic skeletons. Journal of Functional Programming 15(4), 615–650 (2005)CrossRefzbMATHGoogle Scholar
  17. 17.
    Aldinucci, M., Danelutto, M., Dazzi, P.: Muskel: an expandable skeleton environment. Scalable Computing: Practice and Experience 8(4), 325–341 (2007)Google Scholar
  18. 18.
    Benoit, A., Cole, M., Gilmore, S., Hillston, J.: Flexible skeletal programming with eSkel. In: Cunha, J.C., Medeiros, P.D. (eds.) Euro-Par 2005. LNCS, vol. 3648, pp. 761–770. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  19. 19.
    Kuchen, H.: A skeleton library. In: Monien, B., Feldmann, R.L. (eds.) Euro-Par 2002. LNCS, vol. 2400, pp. 620–629. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  20. 20.
    Reinders, J.: Intel Threading Building Blocks: Outfitting C++ for Multi-core Processor Parallelism. O’Reilly Media, Inc., Sebastopol (2007)Google Scholar
  21. 21.
    Emoto, K., Hu, Z., Kakehi, K., Takeichi, M.: A compositional framework for developing parallel programs on two-dimensional arrays. International Journal of Parallel Programming 35(6), 615–658 (2007)CrossRefzbMATHGoogle Scholar
  22. 22.
    Matsuzaki, K.: Efficient implementation of tree accumulations on distributed-memory parallel computers. In: Shi, Y., van Albada, G.D., Dongarra, J., Sloot, P.M.A. (eds.) ICCS 2007, Part II. LNCS, vol. 4488, pp. 609–616. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  23. 23.
    Bischof, H., Gorlatch, S., Leshchinskiy, R.: Generic parallel programming using C++ templates and skeletons. In: Lengauer, C., Batory, D., Consel, C., Odersky, M. (eds.) Domain-Specific Program Generation. LNCS, vol. 3016, pp. 107–126. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  24. 24.
    Scholz, S.-B.: With-loop-folding in SAC — condensing consecutive array operations. In: Clack, C., Hammond, K., Davie, T. (eds.) IFL 1997. LNCS, vol. 1467, p. 72. Springer, Heidelberg (1998)CrossRefGoogle Scholar
  25. 25.
    Scholz, S.B.: Single assignment C — efficient support for high-level array operations in a functional setting. Journal of Functional Programming 13(6) (2003)Google Scholar
  26. 26.
    Falcou, J., Sérot, J., Pech, L., Lapresté, J.T.: Meta-programming applied to automatic SMP parallelization of linear algebra code. In: Luque, E., Margalef, T., Benítez, D. (eds.) Euro-Par 2008. LNCS, vol. 5168, pp. 729–738. Springer, Heidelberg (2008)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Kiminori Matsuzaki
    • 1
  • Kento Emoto
    • 2
  1. 1.School of InformationKochi University of TechnologyJapan
  2. 2.Graduate School of Information Science and TechnologyUniversity of TokyoJapan

Personalised recommendations