Formal specification and implementation of an automated pattern-based parallel-code generation framework

Pérez, Gervasio; Yovine, Sergio

doi:10.1007/s10009-017-0465-2

Formal specification and implementation of an automated pattern-based parallel-code generation framework

Regular Paper
Published: 01 August 2017

Volume 21, pages 183–202, (2019)
Cite this article

International Journal on Software Tools for Technology Transfer Aims and scope Submit manuscript

Gervasio Pérez¹ &
Sergio Yovine¹

261 Accesses
Explore all metrics

Abstract

Programming correct parallel software in a cost-effective way is a challenging task requiring a high degree of expertise. As an attempt to overcoming the pitfalls undermining parallel programming, this paper proposes a pattern-based, formally grounded tool that eases writing parallel code by automatically generating platform-dependent programs from high-level, platform-independent specifications. The tool builds on three pillars: (1) a platform-agnostic parallel programming pattern, called PCR, (2) a formal translation of PCRs into a parallel execution model, namely Concurrent Collections (CnC), and (3) a program rewriting engine that generates code for a concrete runtime implementing CnC. The experimental evaluation carried out gives evidence that code produced from PCRs can deliver performance metrics which are comparable with handwritten code but with assured correctness. The technical contribution of this paper is threefold. First, it discusses a parallel programming pattern, called PCR, consisting of producers, consumers, and reducers which operate concurrently on data sets. To favor correctness, the semantics of PCRs is mathematically defined in terms of the formalism FXML. PCRs are shown to be composable and to seamlessly subsume other well-known parallel programming patterns, thus providing a framework for heterogeneous designs. Second, it formally shows how the PCR pattern can be correctly implemented in terms of a more concrete parallel execution model. Third, it proposes a platform-agnostic C++ template library to express PCRs. It presents a prototype source-to-source compilation tool, based on C++ template rewriting, which automatically generates parallel implementations relying on the Intel CnCC++ library.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Design Patterns and Algorithmic Skeletons: A Brief Concordance

Developing and Optimizing Parallel Programs with Algebra-Algorithmic and Term Rewriting Tools

Restoration of Legacy Parallelism: Transforming Pthreads into Farm and Pipeline Patterns

Article Open access 10 June 2021

Vladimir Janjic, Christopher Brown & Adam D. Barwell

Notes

An in-depth discussion of parallel programming languages is out of the scope of this paper.
Cyclic composition through recursion is discussed in Sect. 3.

References

Aldinucci, M., Campa, S., Danelutto, M., Kilpatrick, P., Torquati, M.: Design patterns percolating to parallel programming framework implementation. Int. J. Parallel Program. 42(6), 1012–1031 (2014)
Article Google Scholar
Aldinucci, M., Danelutto, M., Kilpatrick, P., Torquati, M.: Fastflow: high-level and efficient streaming on multi-core. Programming multi-core and many-core computing systems, parallel and distributed computing (2014)
Anand, C.K., Kahl, W.: Synthesizing and verifying multicore parallelism in categories of nested code graphs. In: Process Algebra for Parallel and Distributed Processing, vol. 2, pp. 3–45. Chapman & Hall (2009)
Asanovic, K., Bodik, R., Demmel, J., Keaveny, T., Keutzer, K., Kubiatowicz, J., Morgan, N., Patterson, D., Sen, K., Wawrzynek, J., Wessel, D., Yelick, K.: A view of the parallel computing landscape. CACM 52(10), 56–67 (2009)
Article Google Scholar
Assayad, I., Bertin, V., Defaut, F.-X., Gerner, P., Quévreux, O., Yovine, S.: Jahuel: a formal framework for software synthesis. In Formal Methods and Software Engineering, pp. 204–218. Springer (2005)
Belikov, E., Deligiannis, P., Totoo, P., Aljabri, M., Loidl, H.-W.: A survey of high-level parallel programming models (2013)
Blumofe, R.D., Joerg, C.F., Kuszmaul, B.C., Leiserson, C.E., Randall, K.H., Zhou, Y.: Cilk: an efficient multithreaded runtime system. J. Parallel Distrib. Comput. 37(1), 55–69 (1996)
Article Google Scholar
Buchty, R., Karl, V., Weiss, W., Weiss, J.-P.: A survey on hardware-aware and heterogeneous computing on multicore processors and accelerators. Concurr. Comput. Pract. Exp. 24(7), 663–675 (2012)
Article Google Scholar
Budimlić, Z., et al.: Concurrent collections. Sci. Program. 18(3), 203–217 (2010)
Google Scholar
Chamberlain, B.L.: Chapel. MIT Press, Cambridge (2015)
Google Scholar
Ciechanowicz, P., Poldner, M., Kuchen, H.: The Münster skeleton library Muesli—a comprehensive overview. Technical report, Münster (2009)
Cole, M.I.: Algorithmic skeletons: structured management of parallel computation. Pitman, London (1989)
MATH Google Scholar
Dagum, L., Menon, R.: OpenMP: an industry standard API for shared-memory programming. IEEE Comput. Sci. Eng. 5(1), 46–55 (1998)
Article Google Scholar
Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. CACM 51(1), 107–113 (2008)
Article Google Scholar
Ekanayake, J., Li, H., Zhang, B., Gunarathne, T., Bae, S.-H., Qiu, J., Fox, G.: Twister: a runtime for iterative mapreduce. In: 19th ACM International Symposium on High Performance Distributed Computing, pp. 810–818, New York, NY, USA. ACM (2010)
Enmyren, J., Kessler, C.W.: SkePU: a multi-backend skeleton programming library for multi-GPU systems. In Proceedings of 4th International Workshop on High-level parallel programming and applications, pp. 5–14. ACM (2010)
Falcou, J., Sérot, J.: Formal semantics applied to the implementation of a skeleton-based parallel programming library. Parallel Comput. Archit. Algorithms Appl. 38, 243–252 (2008)
MATH Google Scholar
Falcou, J., Sérot, J., Chateau, T., Lapresté, J.-T.: Quaff: efficient c++ design for parallel skeletons. Parallel Comput. 32(7), 604–615 (2006)
Article Google Scholar
González-Vélez, H., Cole, M.: Adaptive structured parallelism for distributed heterogeneous architectures: a methodological approach with pipelines and farms. Concurr. Comput. Pract. Exp. 22(15), 2073–2094 (2010)
Google Scholar
Horacio, G.-V., Leyton, M.: A survey of algorithmic skeleton frameworks: high-level structured parallel programming enablers. Softw. Pract. Exp. 40(12), 1135–1160 (2010)
Article Google Scholar
Grelck, C., Scholz, S.-B., Shafarenko, A.: Asynchronous stream processing with s-net. Int. J. Parallel Program. 38(1), 38–67 (2010)
Article MATH Google Scholar
Hempel, R.: The MPI standard for message passing. In: International Conference on High-Performance Computing and Networking, pp. 247–252. Springer (1994)
Hoare, C.A.R.: Communicating sequential processes. In: The Origin of Concurrent Programming, pp. 413–443. Springer (1978)
Hoefler, T., Belli, R.: Scientific benchmarking of parallel computing systems: twelve ways to tell the masses when reporting performance results. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC ’15, pp. 73:1–73:12, New York, NY, USA, ACM (2015)
Imam, S., Sarkar, V.: The Eureka programming model for speculative task parallelism. In: LIPIcs-Leibniz Internatinal Proceedings in Informatics, vol. 37. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik (2015)
Isard, M., Budiu, M., Yu, Y., Birrell, A., Fetterly, D.: Dryad: distributed data-parallel programs from sequential building blocks. In: ACM SIGOPS Operating Systems Review, vol. 41, No. 3, pp. 59–72. ACM (2007)
Javed, N., Loulergue, F.: OSL: optimized Bulk Synchronous Parallel Skeletons on Distributed Arrays. Springer, Berlin (2009)
Google Scholar
Leijen, D., Schulte, W., Burckhardt, S.: The design of a task parallel library. ACM Sigplan Notices 44(10), 227–242 (2009)
Article Google Scholar
Lu, S., Park, S., Seo, E., Zhou, Y.: Learning from mistakes: a comprehensive study on real world concurrency bug characteristics. In: ACM Sigplan Notices, vol. 43, No. 3, pp. 329–339. ACM (2008)
McCool, M., Reinders, J., Robison, A.: Structured Parallel Programming: Patterns for Efficient Computation. Elsevier, Amsterdam (2012)
Google Scholar
Reinders, J.: Intel Threading Building Blocks: Outfitting C++ for Multi-core Processor Parallelism. O’Reilly Media Inc, Sebastopol (2007)
Google Scholar
Saraswat, V.A., Sarkar, V., von Praun, C.: X10: concurrent programming for modern architectures. In: Proceedings of 12th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp. 271. ACM (2007)
Stephens, R.: A survey of stream processing. Acta Inform. 34, 491–541 (1997)
Article MathSciNet MATH Google Scholar
Thies, W., Amarasinghe, S.P.: An empirical characterization of stream programs and its implications for language and compiler design. In: Knoop, J., Salapura, V., Gschwind, M. (eds.) 19th International Conference on Parallel Architecture and Compilation Techniques, pp. 365–376. ACM (2010)
Thies, W., Karczmarek, M., Amarasinghe, S.: Streamit: A language for streaming applications. In: Compiler Construction, pp. 179–196. Springer (2002)
Valiant, L.G.: A bridging model for parallel computation. CACM 33(8), 103–111 (1990)
Article Google Scholar
Valiant, L.G.: A bridging model for multi-core computing. In: Algorithms-ESA 2008, pp. 13–28. Springer (2008)
Walker, E.F., Floyd, R., Neves, P.: Asynchronous remote operation execution in distributed systems. In: Proceedings of 10th International Conference on Distributed Computing Systems, pp. 253–259 (1990)
Yovine, S., Assayad, I., Defaut, F.-X., Zanconi, M., Basu, A.: A formal approach to derivation of concurrent implementations in software product lines. In: Alexander, M., Gardner, W. (eds.) Algebra for Parallel and Distributed Processing, Chapter 11, pp. 359–401. Chapman and Hall, CRC Press, Boca Raton (2008)
Google Scholar
Zandifar, M., Jabbar, M.A., Majidi, A., Keyes, D., Amato, N.M., Rauchwerger, L.: Composing algorithmic skeletons to express high-performance scientific applications. In: Proceedings of 29th ACM on International Conference on Supercomputing, pp. 415–424. ACM (2015)
Zandifar, M., Thomas, N., Amato, N.M., Rauchwerger, L.: The STAPL Skeleton Framework. Springer, Cham (2015)
Book Google Scholar

Download references

Acknowledgements

Partially funded by LIA INFINIS (CNRS, Université Paris Diderot, CONICET, Universidad de Buenos Aires), PEDECIBA and SNI. Thanks to CSC-CONICET for granting use of cluster TUPAC.

Author information

Authors and Affiliations

ICC-CONICET and Universidad de Buenos Aires, Ciudad Autónoma de Buenos Aires, Argentina
Gervasio Pérez & Sergio Yovine

Authors

Gervasio Pérez
View author publications
You can also search for this author in PubMed Google Scholar
Sergio Yovine
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Gervasio Pérez.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Pérez, G., Yovine, S. Formal specification and implementation of an automated pattern-based parallel-code generation framework. Int J Softw Tools Technol Transfer 21, 183–202 (2019). https://doi.org/10.1007/s10009-017-0465-2

Download citation

Published: 01 August 2017
Issue Date: 01 April 2019
DOI: https://doi.org/10.1007/s10009-017-0465-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Formal specification and implementation of an automated pattern-based parallel-code generation framework

Abstract

Access this article

Similar content being viewed by others

Design Patterns and Algorithmic Skeletons: A Brief Concordance

Developing and Optimizing Parallel Programs with Algebra-Algorithmic and Term Rewriting Tools

Restoration of Legacy Parallelism: Transforming Pthreads into Farm and Pipeline Patterns

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Abstract

Access this article

Similar content being viewed by others

Design Patterns and Algorithmic Skeletons: A Brief Concordance

Developing and Optimizing Parallel Programs with Algebra-Algorithmic and Term Rewriting Tools

Restoration of Legacy Parallelism: Transforming Pthreads into Farm and Pipeline Patterns

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation