Advertisement

Formal specification and implementation of an automated pattern-based parallel-code generation framework

  • Gervasio Pérez
  • Sergio Yovine
Regular Paper
  • 64 Downloads

Abstract

Programming correct parallel software in a cost-effective way is a challenging task requiring a high degree of expertise. As an attempt to overcoming the pitfalls undermining parallel programming, this paper proposes a pattern-based, formally grounded tool that eases writing parallel code by automatically generating platform-dependent programs from high-level, platform-independent specifications. The tool builds on three pillars: (1) a platform-agnostic parallel programming pattern, called PCR, (2) a formal translation of PCRs into a parallel execution model, namely Concurrent Collections (CnC), and (3) a program rewriting engine that generates code for a concrete runtime implementing CnC. The experimental evaluation carried out gives evidence that code produced from PCRs can deliver performance metrics which are comparable with handwritten code but with assured correctness. The technical contribution of this paper is threefold. First, it discusses a parallel programming pattern, called PCR, consisting of producers, consumers, and reducers which operate concurrently on data sets. To favor correctness, the semantics of PCRs is mathematically defined in terms of the formalism FXML. PCRs are shown to be composable and to seamlessly subsume other well-known parallel programming patterns, thus providing a framework for heterogeneous designs. Second, it formally shows how the PCR pattern can be correctly implemented in terms of a more concrete parallel execution model. Third, it proposes a platform-agnostic C++ template library to express PCRs. It presents a prototype source-to-source compilation tool, based on C++ template rewriting, which automatically generates parallel implementations relying on the Intel CnC C++ library.

Keywords

Formal methods Software design patterns Parallel programming Automated code generation 

Notes

Acknowledgements

Partially funded by LIA INFINIS (CNRS, Université Paris Diderot, CONICET, Universidad de Buenos Aires), PEDECIBA and SNI. Thanks to CSC-CONICET for granting use of cluster TUPAC.

References

  1. 1.
    Aldinucci, M., Campa, S., Danelutto, M., Kilpatrick, P., Torquati, M.: Design patterns percolating to parallel programming framework implementation. Int. J. Parallel Program. 42(6), 1012–1031 (2014)CrossRefGoogle Scholar
  2. 2.
    Aldinucci, M., Danelutto, M., Kilpatrick, P., Torquati, M.: Fastflow: high-level and efficient streaming on multi-core. Programming multi-core and many-core computing systems, parallel and distributed computing (2014)Google Scholar
  3. 3.
    Anand, C.K., Kahl, W.: Synthesizing and verifying multicore parallelism in categories of nested code graphs. In: Process Algebra for Parallel and Distributed Processing, vol. 2, pp. 3–45. Chapman & Hall (2009)Google Scholar
  4. 4.
    Asanovic, K., Bodik, R., Demmel, J., Keaveny, T., Keutzer, K., Kubiatowicz, J., Morgan, N., Patterson, D., Sen, K., Wawrzynek, J., Wessel, D., Yelick, K.: A view of the parallel computing landscape. CACM 52(10), 56–67 (2009)CrossRefGoogle Scholar
  5. 5.
    Assayad, I., Bertin, V., Defaut, F.-X., Gerner, P., Quévreux, O., Yovine, S.: Jahuel: a formal framework for software synthesis. In Formal Methods and Software Engineering, pp. 204–218. Springer (2005)Google Scholar
  6. 6.
    Belikov, E., Deligiannis, P., Totoo, P., Aljabri, M., Loidl, H.-W.: A survey of high-level parallel programming models (2013)Google Scholar
  7. 7.
    Blumofe, R.D., Joerg, C.F., Kuszmaul, B.C., Leiserson, C.E., Randall, K.H., Zhou, Y.: Cilk: an efficient multithreaded runtime system. J. Parallel Distrib. Comput. 37(1), 55–69 (1996)CrossRefGoogle Scholar
  8. 8.
    Buchty, R., Karl, V., Weiss, W., Weiss, J.-P.: A survey on hardware-aware and heterogeneous computing on multicore processors and accelerators. Concurr. Comput. Pract. Exp. 24(7), 663–675 (2012)CrossRefGoogle Scholar
  9. 9.
    Budimlić, Z., et al.: Concurrent collections. Sci. Program. 18(3), 203–217 (2010)Google Scholar
  10. 10.
    Chamberlain, B.L.: Chapel. MIT Press, Cambridge (2015)Google Scholar
  11. 11.
    Ciechanowicz, P., Poldner, M., Kuchen, H.: The Münster skeleton library Muesli—a comprehensive overview. Technical report, Münster (2009)Google Scholar
  12. 12.
    Cole, M.I.: Algorithmic skeletons: structured management of parallel computation. Pitman, London (1989)zbMATHGoogle Scholar
  13. 13.
    Dagum, L., Menon, R.: OpenMP: an industry standard API for shared-memory programming. IEEE Comput. Sci. Eng. 5(1), 46–55 (1998)CrossRefGoogle Scholar
  14. 14.
    Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. CACM 51(1), 107–113 (2008)CrossRefGoogle Scholar
  15. 15.
    Ekanayake, J., Li, H., Zhang, B., Gunarathne, T., Bae, S.-H., Qiu, J., Fox, G.: Twister: a runtime for iterative mapreduce. In: 19th ACM International Symposium on High Performance Distributed Computing, pp. 810–818, New York, NY, USA. ACM (2010)Google Scholar
  16. 16.
    Enmyren, J., Kessler, C.W.: SkePU: a multi-backend skeleton programming library for multi-GPU systems. In Proceedings of 4th International Workshop on High-level parallel programming and applications, pp. 5–14. ACM (2010)Google Scholar
  17. 17.
    Falcou, J., Sérot, J.: Formal semantics applied to the implementation of a skeleton-based parallel programming library. Parallel Comput. Archit. Algorithms Appl. 38, 243–252 (2008)zbMATHGoogle Scholar
  18. 18.
    Falcou, J., Sérot, J., Chateau, T., Lapresté, J.-T.: Quaff: efficient c++ design for parallel skeletons. Parallel Comput. 32(7), 604–615 (2006)CrossRefGoogle Scholar
  19. 19.
    González-Vélez, H., Cole, M.: Adaptive structured parallelism for distributed heterogeneous architectures: a methodological approach with pipelines and farms. Concurr. Comput. Pract. Exp. 22(15), 2073–2094 (2010)Google Scholar
  20. 20.
    Horacio, G.-V., Leyton, M.: A survey of algorithmic skeleton frameworks: high-level structured parallel programming enablers. Softw. Pract. Exp. 40(12), 1135–1160 (2010)CrossRefGoogle Scholar
  21. 21.
    Grelck, C., Scholz, S.-B., Shafarenko, A.: Asynchronous stream processing with s-net. Int. J. Parallel Program. 38(1), 38–67 (2010)CrossRefzbMATHGoogle Scholar
  22. 22.
    Hempel, R.: The MPI standard for message passing. In: International Conference on High-Performance Computing and Networking, pp. 247–252. Springer (1994)Google Scholar
  23. 23.
    Hoare, C.A.R.: Communicating sequential processes. In: The Origin of Concurrent Programming, pp. 413–443. Springer (1978)Google Scholar
  24. 24.
    Hoefler, T., Belli, R.: Scientific benchmarking of parallel computing systems: twelve ways to tell the masses when reporting performance results. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC ’15, pp. 73:1–73:12, New York, NY, USA, ACM (2015)Google Scholar
  25. 25.
    Imam, S., Sarkar, V.: The Eureka programming model for speculative task parallelism. In: LIPIcs-Leibniz Internatinal Proceedings in Informatics, vol. 37. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik (2015)Google Scholar
  26. 26.
    Isard, M., Budiu, M., Yu, Y., Birrell, A., Fetterly, D.: Dryad: distributed data-parallel programs from sequential building blocks. In: ACM SIGOPS Operating Systems Review, vol. 41, No. 3, pp. 59–72. ACM (2007)Google Scholar
  27. 27.
    Javed, N., Loulergue, F.: OSL: optimized Bulk Synchronous Parallel Skeletons on Distributed Arrays. Springer, Berlin (2009)Google Scholar
  28. 28.
    Leijen, D., Schulte, W., Burckhardt, S.: The design of a task parallel library. ACM Sigplan Notices 44(10), 227–242 (2009)CrossRefGoogle Scholar
  29. 29.
    Lu, S., Park, S., Seo, E., Zhou, Y.: Learning from mistakes: a comprehensive study on real world concurrency bug characteristics. In: ACM Sigplan Notices, vol. 43, No. 3, pp. 329–339. ACM (2008)Google Scholar
  30. 30.
    McCool, M., Reinders, J., Robison, A.: Structured Parallel Programming: Patterns for Efficient Computation. Elsevier, Amsterdam (2012)Google Scholar
  31. 31.
    Reinders, J.: Intel Threading Building Blocks: Outfitting C++ for Multi-core Processor Parallelism. O’Reilly Media Inc, Sebastopol (2007)Google Scholar
  32. 32.
    Saraswat, V.A., Sarkar, V., von Praun, C.: X10: concurrent programming for modern architectures. In: Proceedings of 12th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp. 271. ACM (2007)Google Scholar
  33. 33.
    Stephens, R.: A survey of stream processing. Acta Inform. 34, 491–541 (1997)MathSciNetCrossRefzbMATHGoogle Scholar
  34. 34.
    Thies, W., Amarasinghe, S.P.: An empirical characterization of stream programs and its implications for language and compiler design. In: Knoop, J., Salapura, V., Gschwind, M. (eds.) 19th International Conference on Parallel Architecture and Compilation Techniques, pp. 365–376. ACM (2010)Google Scholar
  35. 35.
    Thies, W., Karczmarek, M., Amarasinghe, S.: Streamit: A language for streaming applications. In: Compiler Construction, pp. 179–196. Springer (2002)Google Scholar
  36. 36.
    Valiant, L.G.: A bridging model for parallel computation. CACM 33(8), 103–111 (1990)CrossRefGoogle Scholar
  37. 37.
    Valiant, L.G.: A bridging model for multi-core computing. In: Algorithms-ESA 2008, pp. 13–28. Springer (2008)Google Scholar
  38. 38.
    Walker, E.F., Floyd, R., Neves, P.: Asynchronous remote operation execution in distributed systems. In: Proceedings of 10th International Conference on Distributed Computing Systems, pp. 253–259 (1990)Google Scholar
  39. 39.
    Yovine, S., Assayad, I., Defaut, F.-X., Zanconi, M., Basu, A.: A formal approach to derivation of concurrent implementations in software product lines. In: Alexander, M., Gardner, W. (eds.) Algebra for Parallel and Distributed Processing, Chapter 11, pp. 359–401. Chapman and Hall, CRC Press, Boca Raton (2008)Google Scholar
  40. 40.
    Zandifar, M., Jabbar, M.A., Majidi, A., Keyes, D., Amato, N.M., Rauchwerger, L.: Composing algorithmic skeletons to express high-performance scientific applications. In: Proceedings of 29th ACM on International Conference on Supercomputing, pp. 415–424. ACM (2015)Google Scholar
  41. 41.
    Zandifar, M., Thomas, N., Amato, N.M., Rauchwerger, L.: The STAPL Skeleton Framework. Springer, Cham (2015)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag GmbH Germany 2017

Authors and Affiliations

  1. 1.ICC-CONICET and Universidad de Buenos AiresCiudad Autónoma de Buenos AiresArgentina

Personalised recommendations