An Abstract Annotation Model for Skeletons

  • Marco Aldinucci
  • Sonia Campa
  • Peter Kilpatrick
  • Fabio Tordini
  • Massimo Torquati
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7542)

Abstract

Multi-core and many-core platforms are becoming increasingly heterogeneous and asymmetric. This significantly increases the porting and tuning effort required for parallel codes, which in turn often leads to a growing gap between peak machine power and actual application performance. In this work a first step toward the automated optimization of high level skeleton-based parallel code is discussed. The paper presents an abstract annotation model for skeleton programs aimed at formally describing suitable mapping of parallel activities on a high-level platform representation. The derived mapping and scheduling strategies are used to generate optimized run-time code.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Parnas, D.L.: On the design and development of program families. IEEE Trans. on Software Engineering SE-2(1), 1–9 (1976)CrossRefMATHGoogle Scholar
  2. 2.
    Cole, M.: Algorithmic Skeletons: Structured Management of Parallel Computations. Research Monographs in Par. and Distrib. Computing. Pitman (1989)Google Scholar
  3. 3.
    Botorog, G.H., Kuchen, H.: Skil: An imperative language with algorithmic skeletons for efficient distributed programming. In: Proc. of the 5th International Symposium on High Performance Distributed Computing, HPDC 1996, pp. 243–252. IEEE Computer Society Press (1996)Google Scholar
  4. 4.
    Darlington, J., Guo, Y., Jing, Y., To, H.W.: Skeletons for structured parallel composition. In: Proc. of the 15th Symposium on Principles and Practice of Parallel Programming (1995)Google Scholar
  5. 5.
    Bacci, B., Danelutto, M., Orlando, S., Pelagatti, S., Vanneschi, M.: P3L: A Structured High level programming language and its structured support. Concurrency Practice and Experience 7(3), 225–255 (1995)CrossRefGoogle Scholar
  6. 6.
    Hamdan, M., King, P., Michaelson, G.: A scheme for nesting algorithmic skeletons. In: Hammond, K., Davie, T., Clack, C. (eds.) Proc. of the 10th International Workshop on the Implementation of Functional Languages, IFL 1998, Department of Computer Science, University College London, pp. 195–211 (1998)Google Scholar
  7. 7.
    Aldinucci, M., Danelutto, M.: Skeleton based parallel programming: functional and parallel semantics in a single shot. Computer Languages, Systems and Structures 33(3-4), 179–192 (2007)CrossRefMATHGoogle Scholar
  8. 8.
    Intel Corp.: Threading Building Blocks (2011)Google Scholar
  9. 9.
    Aldinucci, M., Danelutto, M., Kilpatrick, P., Torquati, M.: Fastflow: high-level and efficient streaming on multi-core. In: Pllana, S., Xhafa, F. (eds.) Programming Multi-core and Many-core Computing Systems. Parallel and Distributed Computing. Wiley (2012)Google Scholar
  10. 10.
    Cole, M.: Bringing skeletons out of the closet: A pragmatic manifesto for skeletal parallel programming. Parallel Computing 30(3), 389–406 (2004)CrossRefGoogle Scholar
  11. 11.
    González-Vélez, H., Leyton, M.: A survey of algorithmic skeleton frameworks: High-level structured parallel programming enablers. Software: Practice and Experience 40(12), 1135–1160 (2010)Google Scholar
  12. 12.
    Vanneschi, M.: The programming model of ASSIST, an environment for parallel and distributed portable applications. Parallel Computing 28(12), 1709–1732 (2002)CrossRefMATHGoogle Scholar
  13. 13.
    Dean, J., Ghemawat, S.: MapReduce: Simplified data processing on large clusters. In: Usenix OSDI 2004, pp. 137–150 (December 2004)Google Scholar
  14. 14.
    Thies, W., Karczmarek, M., Amarasinghe, S.: StreamIt: A Language for Streaming Applications. In: CC 2002. LNCS, vol. 2304, pp. 179–196. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  15. 15.
    Asanovic, K., Bodik, R., Demmel, J., Keaveny, T., Keutzer, K., Kubiatowicz, J., Morgan, N., Patterson, D., Sen, K., Wawrzynek, J., Wessel, D., Yelick, K.: A view of the parallel computing landscape. Comm. of the ACM 52(10), 56–67 (2009)CrossRefGoogle Scholar
  16. 16.
    Apache Software Foundation: Hadoop (2008), http://hadoop.apache.org/
  17. 17.
    Leijen, D., Hall, J.: Optimize managed code for multi-core machines. MSDN Magazine (October 2007)Google Scholar
  18. 18.
    Enmyren, J., Kessler, C.W.: Skepu: a multi-backend skeleton programming library for multi-gpu systems. In: Proceedings of the Fourth International Workshop on High-level Parallel Programming and Applications, HLPP 2010, pp. 5–14. ACM, New York (2010)CrossRefGoogle Scholar
  19. 19.
    Aldinucci, M., Coppola, M., Danelutto, M.: Rewriting skeleton programs: How to evaluate the data-parallel stream-parallel tradeoff. In: Gorlatch, S. (ed.) Proc. of CMPP: Intl. Workshop on Constructive Methods for Parallel Programming, Fakultät für mathematik und informatik, Uni. Passau, Germany, pp. 44–58 (May 1998)Google Scholar
  20. 20.
    Skillicorn, D.B., Cai, W.: A cost calculus for parallel functional programming. J. Parallel Distrib. Comput. 28(1), 65–83 (1995)CrossRefMATHGoogle Scholar
  21. 21.
    Aldinucci, M., Gorlatch, S., Lengauer, C., Pelagatti, S.: Towards parallel programming by transformation: The FAN skeleton framework. Parallel Algorithms and Applications 16(2-3), 87–121 (2001)MathSciNetCrossRefMATHGoogle Scholar
  22. 22.
    Caromel, D., Henrio, L., Leyton, M.: Type safe algorithmic skeletons. In: 16th Euromicro Intl. Conference on Parallel, Distributed and Network-Based Processing, PDP, Toulouse, France, pp. 45–53. IEEE (February 2008)Google Scholar
  23. 23.
    Gorlatch, S., Lengauer, C., Wedler, C.: Optimization rules for programming with collective operations. In: Proc. of the 13th International Parallel Processing Symposium & 10th Symposium on Parallel and Distributed Processing, IPPS/SPDP 1999, pp. 492–499. IEEE Computer Society Press (1999)Google Scholar
  24. 24.
    Skillicorn, D.B., Cai, W.: A cost calculus for parallel functional programming. Journal of Parallel and Distributed Computing 28, 65–83 (1995)CrossRefMATHGoogle Scholar
  25. 25.
    Aldinucci, M., Danelutto, M.: Stream parallel skeleton optimization. In: Proc. of PDCS: Intl. Conference on Parallel and Distributed Computing and Systems, Cambridge, Massachusetts, USA, pp. 955–962. IASTED, ACTA Press (November 1999)Google Scholar
  26. 26.
    Pottenger, B., Eigenmann, R.: Idiom recognition in the Polaris parallelizing compiler. In: Proc. of the 9th Intl. Conference on Supercomputing, ICS 1995, pp. 444–448. ACM Press, New York (1995)Google Scholar
  27. 27.
    Aldinucci, M., Torquati, M.: FastFlow website (2009), http://mc-fastflow.sourceforge.net/
  28. 28.
    Aldinucci, M., Danelutto, M., Kilpatrick, P., Meneghin, M., Torquati, M.: An Efficient Unbounded Lock-Free Queue for Multi-core Systems. In: Kaklamanis, C., Papatheodorou, T., Spirakis, P.G. (eds.) Euro-Par 2012. LNCS, vol. 7484, pp. 662–673. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  29. 29.
    Aldinucci, M., Drocco, M., Giordano, D., Spampinato, C., Torquati, M.: A parallel edge preserving algorithm for salt and pepper image denoising. Technical Report 138/2011, Università degli Studi di Torino, Dip. di Informatica, Italy (May 2011)Google Scholar
  30. 30.
    Kuchen, H.: A Skeleton Library. In: Monien, B., Feldmann, R. (eds.) Euro-Par 2002. LNCS, vol. 2400, pp. 620–629. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  31. 31.
    Ernsting, S., Kuchen, H.: Data parallel skeletons for gpu clusters and multi-gpu systems. In: Proceedings of PARCO 2011. IOS Press (2011)Google Scholar
  32. 32.
    Newton, R., Schlimbach, F., Hampton, M., Knobe, K.: Capturing and composing parallel patterns with Intel CnC. In: Proc. of USENIX Workshop on Hot Topics in Parallelism, HotPar 2010, Berkley, CA, USA (June 2010)Google Scholar
  33. 33.
    Park, I., Voss, M.J., Kim, S.W., Eigenmann, R.: Parallel programming environment for OpenMP. Scientific Programming 9, 143–161 (2001)CrossRefGoogle Scholar
  34. 34.
    Stratton, J.A., Stone, S.S., Hwu, W.-M.W.: MCUDA: An Efficient Implementation of CUDA Kernels for Multi-core CPUs. In: Amaral, J.N. (ed.) LCPC 2008. LNCS, vol. 5335, pp. 16–30. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  35. 35.
    Khronos Compute Working Group: OpenACC Directives for Accelerators (November 2012), http://www.openacc-standard.org

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Marco Aldinucci
    • 1
  • Sonia Campa
    • 2
  • Peter Kilpatrick
    • 3
  • Fabio Tordini
    • 1
  • Massimo Torquati
    • 2
  1. 1.Computer Science DepartmentUniversity of TorinoItaly
  2. 2.Computer Science DepartmentUniversity of PisaItaly
  3. 3.Computer Science DepartmentQueen’s University BelfastUK

Personalised recommendations