Abstract
This paper describes the stapl Skeleton Framework, a high-level skeletal approach for parallel programming. This framework abstracts the underlying details of data distribution and parallelism from programmers and enables them to express parallel programs as a composition of existing elementary skeletons such as map, map-reduce, scan, zip, butterfly, allreduce, alltoall and user-defined custom skeletons.
Skeletons in this framework are defined as parametric data flow graphs, and their compositions are defined in terms of data flow graph compositions. Defining the composition in this manner allows dependencies between skeletons to be defined in terms of point-to-point dependencies, avoiding unnecessary global synchronizations. To show the ease of composability and expressivity, we implemented the NAS Integer Sort (IS) and Embarrassingly Parallel (EP) benchmarks using skeletons and demonstrate comparable performance to the hand-optimized reference implementations. To demonstrate scalable performance, we show a transformation which enables applications written in terms of skeletons to run on more than 100,000 cores.
This research supported in part by NSF awards CNS-0551685, CCF-0833199, CCF-0830753, IIS-0916053, IIS-0917266, EFRI-1240483, RI-1217991, by NIH NCI R25 CA090301-11, by DOE awards DE-AC02-06CH11357, DE-NA0002376, B575363, by Samsung, Chevron, IBM, Intel, Oracle/Sun and by Award KUS-C1-016-04, made by King Abdullah University of Science and Technology (KAUST). This research used resources of the National Energy Research Scientific Computing Center, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Aldinucci, M., Campa, S., Danelutto, M., Kilpatrick, P., Torquati, M.: Targeting distributed systems in fastflow. In: Caragiannis, I., et al. (eds.) Euro-Par Workshops 2012. LNCS, vol. 7640, pp. 47–56. Springer, Heidelberg (2013)
Aldinucci, M., Danelutto, M., et al.: Fastflow: high-level and efficient streaming on multi-core. (a fastflow short tutorial). Program. Multi-core Many-core Comp. Sys. Par. Dist. Comp. (2007)
Bailey, D., Harris, T., et al.: The NAS Parallel Benchmarks 2.0. Report NAS-95-020, Numerical Aerodynamic Simulation Facility, NASA Ames Research Center, Mail Stop T 27 A-1, Moffett Field, CA 94035–1000, USA, December 1995
Blelloch, G.E.: Prefix sums and their applications. Technical report CMU-CS-90-190. School of Computer Science, Carnegie Mellon University, November 1990
Budimlić, Z., Burke, M., et al.: Concurrent collections. Sci. Prog. 18(3), 203–217 (2010)
Buss, A., et al.: The STAPL pView. In: Cooper, K., Mellor-Crummey, J., Sarkar, V. (eds.) LCPC 2010. LNCS, vol. 6548, pp. 261–275. Springer, Heidelberg (2011)
Buss, A., Amato, N.M., Rauchwerger, L.: STAPL: standard template adaptive parallel library. In: Proceedings of the Annual Haifa Experimental Systems Conference (SYSTOR), pp. 1–10. ACM, New York (2010)
Buss, A.A., Smith, T.G., Tanase, G., Thomas, N.L., Bianco, M., Amato, N.M., Rauchwerger, L.: Design for interoperability in stapl: pMatrices and linear algebra algorithms. In: Amaral, J.N. (ed.) LCPC 2008. LNCS, vol. 5335, pp. 304–315. Springer, Heidelberg (2008)
Cole, M.I.: Algorithmic Skeletons: Structured Management of Parallel Computation. Pitman, London (1989)
Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)
Falcou, J., Sérot, J., et al.: Quaff: efficient c++ design for parallel skeletons. Par. Comp. 32(7), 604–615 (2006)
Gamma, E., et al.: Design Patterns: Elements of Reusable Object-Oriented Software. Pearson Education, New York (1994)
Gansner, E.R., North, S.C.: An open graph visualization system and its applications to software engineering. Softw. Pract. Experience 30(11), 1203–1233 (2000)
González-Vélez, H., Leyton, M.: A survey of algorithmic skeleton frameworks: high-level structured parallel programming enablers. Softw. Pract. Experience 40(12), 1135–1160 (2010)
Hammond, K., et al.: The ParaPhrase project: parallel patterns for adaptive heterogeneous multicore systems. In: Beckert, B., Damiani, F., de Boer, F.S., Bonsangue, M.M. (eds.) FMCO 2011. LNCS, vol. 7542, pp. 218–236. Springer, Heidelberg (2013)
Harshvardhan, Fidel, A., Amato, N.M., Rauchwerger, L.: The STAPL parallel graph library. In: Kasahara, H., Kimura, K. (eds.) LCPC 2012. LNCS, vol. 7760, pp. 46–60. Springer, Heidelberg (2013)
Herrmann, C.A., Lengauer, C.: Transforming rapid prototypes to efficient parallel programs. In: Rabhi, F.A., Gorlatch, S. (eds.) Patterns and Skeletons for Parallel and Distributed Computing, pp. 65–94. Springer, London (2003)
Johnston, W.M., Hanna, J., et al.: Advances in dataflow programming languages. ACM Comp. Surv. (CSUR) 36(1), 1–34 (2004)
Kale, L.V., Krishnan, S.: CHARM++: a portable concurrent object oriented system based on C++. SIGPLAN Not. 28(10), 91–108 (1993)
Matsuzaki, K., Iwasaki, H., et al.: A library of constructive skeletons for sequential style of parallel programming. In: Proceedings of the 1st International Conference on Scalable information systems, p. 13. ACM (2006)
McCool, M., Reinders, J., et al.: Structured Parallel Programming: Patterns for Efficient Computation. Elsevier, Waltham (2012)
Morrison, J.P.: Flow-Based Programming: A New Approach to Application Development. CreateSpace, Paramount (2010)
Müller-Funk, U., Thonemann, U., et al.: The Münster Skeleton Library Muesli- A Comprehensive Overview (2009)
Musser, D., Derge, G., Saini, A.: STL Tutorial and Reference Guide, 2nd edn. Addison-Wesley, Reading (2001)
Newton, P., Browne, J.C.: The code 2.0 graphical parallel programming language. In: Proceedings of the 6th International Conference on Supercomputing, pp. 167–177. ACM (1992)
Rabhi, F., Gorlatch, S.: Patterns and Skeletons for Parallel and Distributed Computing. Springer, London (2003)
Robison, A.D.: Composable parallel patterns with intel cilk plus. Comp. Sci. Eng. 15(2), 66–71 (2013)
Sanders, P., Träff, J.L.: Parallel prefix (scan) algorithms for MPI. In: Mohr, B., Träff, J.L., Worringen, J., Dongarra, J. (eds.) PVM/MPI 2006. LNCS, vol. 4192, pp. 49–57. Springer, Heidelberg (2006)
Tanase, G., Amato, N.M., Rauchwerger, L.: The STAPL parallel container framework. In: Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP), San Antonio, Texas, USA, pp. 235–246 (2011)
Acknowledgments
We would like to thank Adam Fidel for his help with the experimental evaluation.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Zandifar, M., Thomas, N., Amato, N.M., Rauchwerger, L. (2015). The stapl Skeleton Framework. In: Brodman, J., Tu, P. (eds) Languages and Compilers for Parallel Computing. LCPC 2014. Lecture Notes in Computer Science(), vol 8967. Springer, Cham. https://doi.org/10.1007/978-3-319-17473-0_12
Download citation
DOI: https://doi.org/10.1007/978-3-319-17473-0_12
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-17472-3
Online ISBN: 978-3-319-17473-0
eBook Packages: Computer ScienceComputer Science (R0)