Synthesizing Code for GPGPUs from Abstract Formal Models

  • Gabriel Hjort Blindell
  • Christian Menne
  • Ingo Sander
Chapter

Abstract

Today multiple frameworks exist for elevating the task of writing programs for GPGPUs, which are massively data-parallel execution platforms. These are needed as writing correct and high-performing applications for GPGPUs is notoriously difficult due to the intricacies of the underlying architecture. However, the existing frameworks lack a formal foundation that makes them difficult to use together with formal verification, testing, and design space exploration. We present in this chapter a novel software synthesis tool—called f2cc—which is capable of generating efficient GPGPU code from abstract formal models based on the synchronous model of computation. These models can be built using high-level modeling methodologies that hide low-level architecture details from the developer. The correctness of the tool has been experimentally validated on models derived from two applications. The experiments also demonstrate that the synthesized GPGPU code yielded a 28× speedup when executed on a graphics card with 96 cores and compared against a sequential version that uses only the CPU.

References

  1. 1.
    Attarzadeh Niaki, S.H., Jakobsen, M.K., Sulonen, T., Sander, I.: Formal heterogeneous system modeling with SystemC. In: Forum on Specification and Design Languages, FDL 2012, pp. 160–167, Vienna, Austria, September 2012Google Scholar
  2. 2.
    Bell, N., Hoberock, J.: Thrust: A productivity-oriented library for cuda. In: Wen-mei, W.H. (ed.) GPU Computing Gems, Jade edition, Chapter 26, pp. 356–371. Morgan Kaufmann, Los Altos, CA (2011)Google Scholar
  3. 3.
    Benveniste, A., Berry, G.: The synchronous approach to reactive and real-time systems. Proc. IEEE 79(9), 1270–1280 (1991)CrossRefGoogle Scholar
  4. 4.
    Berry, G., Cosserat, L.: The ESTEREL synchronous programming language and its mathematical semantics. In: Brookes, S., Roscoe, A., Winskel, G. (eds.) Seminar on Concurrency. Lecture Notes in Computer Science, vol. 197, pp. 389–448. Springer, Berlin (1985)Google Scholar
  5. 5.
    Brandes, U., Eiglsperger, M., Lerner, J.: GraphML Primer (June 2004). http://graphml.graphdrawing.org/primer/graphml-primer.html (last visited 2014-05-19).
  6. 6.
    Chackravarty, M.M.T., Keller, G., Lee, S., McDonell, T.L., Grover, V.: Accelerating haskell array codes with multicore GPUs. In: Proceedings of the 6th Workshop on Declarative Aspects of Multicore Programming (DAMP’11), pp. 3–14 (2011)Google Scholar
  7. 7.
    Dastgeer, U., Kessler, C.W., Thibault, S.: Flexible runtime support for efficient skeleton programming on hybrid systems. In: Proceedings of the International Conference on Parallel Programming (ParCo’11), Heraklion, Greece (2011)Google Scholar
  8. 8.
    Edwards, S., Lavagno, L., Lee, E.A., Sangiovanni-Vincentelli, A.: Design of embedded systems: formal models, validation, and synthesis. Proc. IEEE 85, 366–387 (1997)CrossRefGoogle Scholar
  9. 9.
    Garland, M., Kirk, D.B.: Understanding throughput-oriented architectures. Commun. ACM 53, 58–66 (2010)CrossRefGoogle Scholar
  10. 10.
    Garland, M., Le Grand, S., Nickolls, J., Anderson, J., Hardwick, J., Morton, S., Phillips, E., Zhang, Y., Volkov, V.: Parallel computation experiences with cuda. IEEE Micro 28, 13–27 (2008)CrossRefGoogle Scholar
  11. 11.
    Halbwachs, N., Caspi, P., Raymond, P., Pilaud, D.: The synchronous dataflow programming language LUSTRE. Proc. IEEE 79(9), 1305–1320 (1991)CrossRefGoogle Scholar
  12. 12.
    Han, T.D., Abdelrahman, T.S.: hiCUDA: high-level GPGPU programming. IEEE Trans. Parallel Distrib. Syst. 22, 78–90 (2011)Google Scholar
  13. 13.
    Hjort Blindell, G.: Synthesizing software from a ForSyDe model targeting GPGPUs. Master’s thesis, KTH Royal Institute of Technology, School of Information and Communication, Stockholm, Sweden (2012)Google Scholar
  14. 14.
    Kirk, D.B., Wen-mei, W.H.: Programming Massively Parallel Processors. Morgan Kaufmann, Los Altos, CA (2010)Google Scholar
  15. 15.
    Lee, E.A., Sangiovanni-Vincentelli, A.: A framework for comparing models of computation. IEEE Trans. Comput.-Aided Design Integr. Circuits Syst. 17(12), 1217–1229 (1998)CrossRefGoogle Scholar
  16. 16.
    Lee, S., Min, S.-J., Eigenmann, R.: OpenMP-to-CUDA: a compiler framework for automatic translation and optimization. In: Proceedings of the 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’09), vol. 44, pp. 101–110 (2009)Google Scholar
  17. 17.
    Lindholm, E., Nickolls, J., Oberman, S., Montrym, J.: Nvidia Tesla: a unified graphics and computing architecture. IEEE Micro. 30, 39–55 (2010)Google Scholar
  18. 18.
    Nickolls, J., Dally, W.J.: The GPU computing era. IEEE Micro 30, 56–69 (2010)CrossRefGoogle Scholar
  19. 19.
    Sander, I., Jantsch, A.: System modeling and transformational design refinement in ForSyDe. IEEE Trans. Comput.-Aided Design Integr. Circuits Syst. 23, 17–32 (2004)CrossRefGoogle Scholar
  20. 20.
    Svensson, J., Claessen, K., Sheeran, M.: GPGPU kernel implementation and refinement using obsidian. In: Proceedings of the International Conference on Computational Science (ICCS’10), vol. 1, pp. 2065–2074 (2010)Google Scholar
  21. 21.
    Thies, W., Karczmarek, M., Amarasinghe, S.P.: StreamIt: a language for streaming applications. In Proceedings of the 11th International Conference on Compiler Construction, CC ’02, pp. 179–196 (2002)Google Scholar
  22. 22.
    Ungureanu, G.: Automatic software synthesis from high-level ForSyDe models targeting massively parallel processors. Master’s thesis, KTH Royal Institute of Technology, School of Information and Communication, Stockholm, Sweden (2013)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • Gabriel Hjort Blindell
    • 1
  • Christian Menne
    • 1
  • Ingo Sander
    • 1
  1. 1.Department of Electronic SystemsSchool of Information and Communication Technology, KTH Royal Institute of TechnologyStockholmSweden

Personalised recommendations