Advertisement

Single Assignment C (SAC) High Productivity Meets High Performance

  • Clemens Grelck
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7241)

Abstract

We present the ins and outs of the purely functional, data parallel programming language SaC (Single Assignment C). SaC defines state- and side-effect-free semantics on top of a syntax resembling that of imperative languages like C/C++/C# or Java: functional programming with curly brackets. In contrast to other functional languages data aggregation in SaC is not based on lists and trees, but puts stateless arrays into the focus.

SaC implements an abstract calculus of truly multidimensional arrays that is adopted from interpreted array languages like Apl. Arrays are abstract values with certain structural properties. They are treated in a holistic way, not as loose collections of data cells or indexed memory address ranges. Programs can and should be written in a mostly index-free style. Functions consume array values as arguments and produce array values as results. The array type system of SaC allows such functions to abstract not only from the size of vectors or matrices but likewise from the number of array dimensions, supporting a highly generic programming style.

The design of SaC aims at reconciling high productivity in software engineering of compute-intensive applications with high performance in program execution on modern multi- and many-core computing systems. While SaC competes with other functional and declarative languages on the productivity aspect, it competes with hand-parallelised C and Fortran code on the performance aspect. We achieve our goal through stringent co-design of programming language and compilation technology.

The focus on arrays in general and the abstract view of arrays in particular combined with a functional state-free semantics are key ingredients in the design of SaC. In conjunction they allow for far-reaching program transformations and fully compiler-directed parallelisation. From literally the same source code SaC currently supports symmetric multi-socket, multi-core, hyperthreaded server systems, CUDA-enables graphics accelerators and the MicroGrid, an innovative general-purpose many-core architecture.

The CEFP lecture provides an introduction into the language design of SaC, followed by an illustration of how these concepts can be harnessed to write highly abstract, reusable and elegant code. We conclude with outlining the major compiler technologies for achieving runtime performance levels that are competitive with low-level machine-oriented programming environments.

Keywords

Index Vector Runtime System Functional Language Target Architecture Shape Vector 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Moore, G.E.: Cramming more components onto integrated circuits. Electronics 38 (1965)Google Scholar
  2. 2.
    Sutter, H.: The free lunch is over: A fundamental turn towards concurrency in software. Dr. Dobb’s Journal 30 (2005)Google Scholar
  3. 3.
    Meuer, H., Strohmaier, E., Simon, H., Dongarra, J.: 38th top500 list (2011), www.top500.org
  4. 4.
    Intel: Product Brief: Intel Xeon Processor 7500 Series. Intel (2010)Google Scholar
  5. 5.
    AMD: AMD Opteron 6000 Series Platform Quick Reference Guide. AMD (2011)Google Scholar
  6. 6.
    Koufaty, D., Marr, D.: Hyperthreading technology in the netburst microarchitecture. IEEE Micro 23, 56–65 (2003)CrossRefGoogle Scholar
  7. 7.
    Sun/Oracle: Oracle’s SPARC T3-1, SPARC T3-2, SPARC T3-4 and SPARC T3-1B Server Architecture. Whitepaper, Oracle (2011)Google Scholar
  8. 8.
    Shin, J.L., Huang, D., Petrick, B., et al.: A 40 nm 16-core 128-thread SPARC SoC processor. IEEE Journal of Solid-State Circuits 46, 131–144 (2011)CrossRefGoogle Scholar
  9. 9.
    Grelck, C., Scholz, S.B.: SAC: A functional array language for efficient multithreaded execution. Int. Journal of Parallel Programming 34, 383–427 (2006)CrossRefzbMATHGoogle Scholar
  10. 10.
    Grelck, C., Scholz, S.B.: SAC: Off-the-Shelf Support for Data-Parallelism on Multicores. In: Glew, N., Blelloch, G. (eds.) 2nd Workshop on Declarative Aspects of Multicore Programming (DAMP 2007), Nice, France, pp. 25–33. ACM Press (2007)Google Scholar
  11. 11.
    Falkoff, A., Iverson, K.: The Design of APL. IBM Journal of Research and Development 17, 324–334 (1973)CrossRefzbMATHGoogle Scholar
  12. 12.
    International Standards Organization: Programming Language APL, Extended. ISO N93.03, ISO (1993)Google Scholar
  13. 13.
    Hui, R.: An Implementation of J. Iverson Software Inc., Toronto (1992)Google Scholar
  14. 14.
    Jenkins, M.: Q’Nial: A Portable Interpreter for the Nested Interactive Array Language Nial. Software Practice and Experience 19, 111–126 (1989)CrossRefGoogle Scholar
  15. 15.
    Bousias, K., Guang, L., Jesshope, C., Lankamp, M.: Implementation and Evaluation of a Microthread Architecture. J. Systems Architecture 55, 149–161 (2009)CrossRefGoogle Scholar
  16. 16.
    Schildt, H.: American National Standards Institute, International Organization for Standardization, International Electrotechnical Commission, ISO/IEC JTC 1: The annotated ANSI C standard: American National Standard for Programming Languages C: ANSI/ISO 9899-1990. McGraw-Hill (1990)Google Scholar
  17. 17.
    Kernighan, B., Ritchie, D.: The C Programming Language. Prentice-Hall (1988)Google Scholar
  18. 18.
    Iverson, K.: A Programming Language. John Wiley (1962)Google Scholar
  19. 19.
    Iverson, K.: Programming in J. Iverson Software Inc., Toronto (1991)Google Scholar
  20. 20.
    Burke, C.: J and APL. Iverson Software Inc., Toronto (1996)Google Scholar
  21. 21.
    Jenkins, M., Jenkins, W.: The Q’Nial Language and Reference Manual. Nial Systems Ltd., Ottawa (1993)zbMATHGoogle Scholar
  22. 22.
    Mullin, L.R., Jenkins, M.: A Comparison of Array Theory and a Mathematics of Arrays. In: Arrays, Functional Languages and Parallel Systems, pp. 237–269. Kluwer Academic Publishers (1991)Google Scholar
  23. 23.
    Mullin, L.R., Jenkins, M.: Effective Data Parallel Computation using the Psi Calculus. Concurrency — Practice and Experience 8, 499–515 (1996)CrossRefGoogle Scholar
  24. 24.
    Dagum, L., Menon, R.: OpenMP: An Industry-Standard API for Shared-Memory Programming. IEEE Transactions on Computational Science and Engineering 5 (1998)Google Scholar
  25. 25.
    Chapman, B., Jost, G., van der Pas, R.: Using OpenMP: Portable Shared Memory Parallel Programming. MIT Press (2008)Google Scholar
  26. 26.
    Gropp, W., Lusk, E., Skjellum, A.: Using MPI: Portable Parallel Programming with the Message Passing Interface. MIT Press (1994)Google Scholar
  27. 27.
    Douma, R.: Nested Arrays in Single Assignment C. Master’s thesis, University of Amsterdam, Amsterdam, Netherlands (2011)Google Scholar
  28. 28.
    Trojahner, K., Grelck, C.: Dependently Typed Array Programs Don’t Go Wrong. Journal of Logic and Algebraic Programming 78, 643–664 (2009)MathSciNetCrossRefzbMATHGoogle Scholar
  29. 29.
    Trojahner, K.: QUBE — Array Programming with Dependent Types. PhD thesis, University of Lübeck, Lübeck, Germany (2011)Google Scholar
  30. 30.
    Grelck, C., Scholz, S.B.: Axis Control in SAC. In: Peña, R., Arts, T. (eds.) IFL 2002. LNCS, vol. 2670, pp. 182–198. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  31. 31.
    Wadler, P.: Comprehending Monads. Mathematical Structures in Computer Science 2 (1992)Google Scholar
  32. 32.
    Peyton Jones, S., Launchbury, J.: State in Haskell. Lisp and Symbolic Computation 8, 293–341 (1995)CrossRefGoogle Scholar
  33. 33.
    Smetsers, S., Barendsen, E., van Eekelen, M., Plasmeijer, M.: Guaranteeing Safe Destructive Updates through a Type System with Uniqueness Information for Graphs. Technical report, University of Nijmegen, Nijmegen, Netherlands (1993)Google Scholar
  34. 34.
    Achten, P., Plasmeijer, M.: The ins and outs of Clean I/O. Journal of Functional Programming 5, 81–110 (1995)CrossRefzbMATHGoogle Scholar
  35. 35.
    Grelck, C.: Integration eines Modul- und Klassen-Konzeptes in die funktionale Programmiersprache SAC – Single Assignment C. Master’s thesis, University of Kiel, Germany (1996)Google Scholar
  36. 36.
    Grelck, C., Scholz, S.B.: Classes and Objects as Basis for I/O in SAC. In: 7th International Workshop on Implementation of Functional Languages (IFL 1995), Båstad, Sweden, pp. 30–44. Chalmers University of Technology, Gothenburg (1995)Google Scholar
  37. 37.
    Herhut, S., Scholz, S.B., Grelck, C.: Controllling Chaos — On Safe Side-Effects in Data-Parallel Operations. In: 4th Workshop on Declarative Aspects of Multicore Programming (DAMP 2009), Savannah, USA, pp. 59–67. ACM Press (2009)Google Scholar
  38. 38.
    Grelck, C., Scholz, S., Shafarenko, A.: Asynchronous Stream Processing with S-Net. International Journal of Parallel Programming 38, 38–67 (2010)CrossRefzbMATHGoogle Scholar
  39. 39.
    Scholz, S.B.: Single Assignment C — efficient support for high-level array operations in a functional setting. Journal of Functional Programming 13, 1005–1059 (2003)MathSciNetCrossRefzbMATHGoogle Scholar
  40. 40.
    Grelck, C., van Deurzen, T., Herhut, S., Scholz, S.B.: Asynchronous Adaptive Optimisation for Generic Data-Parallel Array Programming. Concurrency and Computation: Practice and Experience (2011)Google Scholar
  41. 41.
    Scholz, S.-B.: WITH-Loop-Folding in SAC - Condensing Consecutive Array Operations. In: Clack, C., Hammond, K., Davie, T. (eds.) IFL 1997. LNCS, vol. 1467, pp. 72–92. Springer, Heidelberg (1998)CrossRefGoogle Scholar
  42. 42.
    Grelck, C., Hinckfuß, K., Scholz, S.B.: With-Loop Fusion for Data Locality and Parallelism. In: Butterfield, A., Grelck, C., Huch, F. (eds.) IFL 2005. LNCS, vol. 4015, pp. 178–195. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  43. 43.
    Grelck, C., Scholz, S.-B., Trojahner, K.: With-Loop Scalarization – Merging Nested Array Operations. In: Trinder, P., Michaelson, G.J., Peña, R. (eds.) IFL 2003. LNCS, vol. 3145, pp. 118–134. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  44. 44.
    Grelck, C., Scholz, S.B.: Merging compositions of array skeletons in SAC. Journal of Parallel Computing 32, 507–522 (2006)CrossRefGoogle Scholar
  45. 45.
    Bernecky, R., Herhut, S., Scholz, S.-B., Trojahner, K., Grelck, C., Shafarenko, A.: Index Vector Elimination – Making Index Vectors Affordable. In: Horváth, Z., Zsók, V., Butterfield, A. (eds.) IFL 2006. LNCS, vol. 4449, pp. 19–36. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  46. 46.
    Grelck, C.: Improving Cache Effectiveness through Array Data Layout Manipulation in SAC. In: Mohnen, M., Koopman, P. (eds.) IFL 2000. LNCS, vol. 2011, pp. 231–248. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  47. 47.
    Grelck, C., Kreye, D., Scholz, S.B.: On Code Generation for Multi-Generator WITH-Loops in SAC. In: Koopman, P., Clack, C. (eds.) IFL 1999. LNCS, vol. 1868, pp. 77–94. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  48. 48.
    Wilson, P.R.: Uniprocessor Garbage Collection Techniques. In: Bekkers, Y., Cohen, J. (eds.) IWMM 1992. LNCS, vol. 637, pp. 1–42. Springer, Heidelberg (1992)CrossRefGoogle Scholar
  49. 49.
    Jones, R.: Garbage Collection: Algorithms for Automatic Dynamic Memory Management. John Wiley (1999)Google Scholar
  50. 50.
    Marlow, S., Harris, T., James, R.P., Peyton Jones, S.: Parallel generational-copying garbage collection with a block-structured heap. In: 7th Int. Symposium on Memory Management (ISMM 2008), Tucson, AZ, USA, pp. 11–20. ACM (2008)Google Scholar
  51. 51.
    Hudak, P., Bloss, A.: The Aggregate Update Problem in Functional Programming Systems. In: 12th ACM Symposium on Principles of Programming Languages (POPL 1985), New Orleans, USA, pp. 300–313. ACM Press (1985)Google Scholar
  52. 52.
    Collins, G.E.: A Method for Overlapping and Erasure of Lists. CACM 3, 655–657 (1960)MathSciNetCrossRefGoogle Scholar
  53. 53.
    Grelck, C., Trojahner, K.: Implicit Memory Management for SaC. In: 16th International Workshop on Implementation and Application of Functional Languages, IFL 2004, Lübeck, Germany, pp. 335–348. University of Kiel, Institute of Computer Science and Applied Mathematics (2004); Technical Report 0408Google Scholar
  54. 54.
    Grelck, C., Scholz, S.B.: Efficient Heap Management for Declarative Data Parallel Programming on Multicores. In: 3rd Workshop on Declarative Aspects of Multicore Programming (DAMP 2008), San Francisco, CA, USA, pp. 17–31. ACM Press (2008)Google Scholar
  55. 55.
    Grelck, C.: A Multithreaded Compiler Backend for High-Level Array Programming. In: 2nd International Conference on Parallel and Distributed Computing and Networks (PDCN 2003), Innsbruck, Austria, pp. 478–484. ACTA Press (2003)Google Scholar
  56. 56.
    Grelck, C.: Shared memory multiprocessor support for functional array processing in SAC. Journal of Functional Programming 15, 353–401 (2005)CrossRefzbMATHGoogle Scholar
  57. 57.
    Zhangzheng, Z.: Using OpenMP as an Alternative Parallelization Strategy in SAC. Master’s thesis, University of Amsterdam, Amsterdam, Netherlands (2011)Google Scholar
  58. 58.
    Kirk, D., Hwu, W.: Programming Massively Parallel Processors: A Hands-on Approach. Morgan Kaufmann (2010)Google Scholar
  59. 59.
    Guo, J., Thiyagalingam, J., Scholz, S.B.: Breaking the GPU programming barrier with the auto-parallelising SAC compiler. In: 6th Workshop on Declarative Aspects of Multicore Programming (DAMP 2011), Austin, TX, USA. ACM Press (2011)Google Scholar
  60. 60.
    Bernard, T., Grelck, C., Jesshope, C.: On the compilation of a language for general concurrent target architectures. Parallel Processing Letters 20, 51–69 (2010)MathSciNetCrossRefGoogle Scholar
  61. 61.
    Herhut, S., Joslin, C., Scholz, S.B., Grelck, C.: Truly Nested Data-Parallelism: Compiling SAC to the Microgrid Architecture. In: 21st Symposium on Implementation and Application of Functional Languages (IFL 2009), South Orange, NJ, USA. Seton Hall University (2009)Google Scholar
  62. 62.
    Herhut, S., Joslin, C., Scholz, S.-B., Poss, R., Grelck, C.: Concurrent Non-deferred Reference Counting on the Microgrid: First Experiences. In: Hage, J., Morazán, M.T. (eds.) IFL 2010. LNCS, vol. 6647, pp. 185–202. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  63. 63.
    Wieser, V., Grelck, C., Haslinger, P., Guo, J., Korzeniowski, F., Bernecky, R., Moser, B., Scholz, S.: Combining high productivity and high performance in image processing using Single Assignment C on multi-core cpus and many-core gpus. Journal of Electronic Imaging (to appear)Google Scholar
  64. 64.
    Grelck, C., Douma, R.: SAC on a Niagara T3-4 Server: Lessons and Experiences. In: 15th Int. Conference on Parallel Computing (ParCo 2011), Ghent, Belgium (2011)Google Scholar
  65. 65.
    Rolls, D., Joslin, C., Kudryavtsev, A., Scholz, S.-B., Shafarenko, A.: Numerical Simulations of Unsteady Shock Wave Interactions Using SaC and Fortran-90. In: Malyshkin, V. (ed.) PaCT 2009. LNCS, vol. 5698, pp. 445–456. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  66. 66.
    Shafarenko, A., Scholz, S.B., Herhut, S., Grelck, C., Trojahner, K.: Implementing a Numerical Solution of the KPI Equation using Single Assignment C: Lessons and Experiences. In: Butterfield, A., Grelck, C., Huch, F. (eds.) IFL 2005. LNCS, vol. 4015, pp. 160–177. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  67. 67.
    Grelck, C., Scholz, S.B.: Towards an Efficient Functional Implementation of the NAS Benchmark FT. In: Malyshkin, V.E. (ed.) PaCT 2003. LNCS, vol. 2763, pp. 230–235. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  68. 68.
    Grelck, C.: Implementing the NAS Benchmark MG in SAC. In: Prasanna, V.K., Westrom, G. (eds.) 16th International Parallel and Distributed Processing Symposium (IPDPS 2002), Fort Lauderdale, USA. IEEE Computer Society Press (2002)Google Scholar
  69. 69.
    Bailey, D., et al.: The NAS Parallel Benchmarks. International Journal of Supercomputer Applications 5, 63–73 (1991)CrossRefGoogle Scholar
  70. 70.
    van Groningen, J.: The Implementation and Efficiency of Arrays in Clean 1.1. In: Kluge, W.E. (ed.) IFL 1996. LNCS, vol. 1268, pp. 105–124. Springer, Heidelberg (1997)CrossRefGoogle Scholar
  71. 71.
    Zörner, T.: Numerical Analysis and Functional Programming. In: 10th International Workshop on Implementation of Functional Languages (IFL 1998), London, UK, University College, pp. 27–48 (1998)Google Scholar
  72. 72.
    Chakravarty, M.M., Keller, G.: An Approach to Fast Arrays in Haskell. In: Jeuring, J., Jones, S.L.P. (eds.) AFP 2002. LNCS, vol. 2638, pp. 27–58. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  73. 73.
    Peyton Jones, S., Leshchinskiy, R., Keller, G., Chakravarty, M.: Harnessing the multicores: Nested data parallelism in Haskell. In: IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS 2008), Bangalore, India, pp. 383–414 (2008)Google Scholar
  74. 74.
    Blelloch, G., Chatterjee, S., Hardwick, J., Sipelstein, J., Zagha, M.: Implementation of a Portable Nested Data-Parallel Language. Journal of Parallel and Distributed Computing 21, 4–14 (1994)CrossRefGoogle Scholar
  75. 75.
    McGraw, J., Skedzielewski, S., Allan, S., Oldehoeft, R., et al.: Sisal: Streams and Iteration in a Single Assignment Language: Reference Manual Version 1.2. M 146. Lawrence Livermore National Laboratory, Livermore (1985)Google Scholar
  76. 76.
    Cann, D.: Retire Fortran? A Debate Rekindled. CACM 35, 81–89 (1992)CrossRefGoogle Scholar
  77. 77.
    Oldehoeft, R.: Implementing Arrays in SISAL 2.0. In: 2nd SISAL Users Conference, San Diego, CA, USA, pp. 209–222. Lawrence Livermore National Laboratory (1992)Google Scholar
  78. 78.
    Feo, J., Miller, P., Skedzielewski, S.K., Denton, S., Solomon, C.: Sisal 90. In: Conference on High Performance Functional Computing (HPFC 1995), Denver, CO, USA, pp. 35–47. Lawrence Livermore National Laboratory, Livermore (1995)Google Scholar
  79. 79.
    Hammes, J., Draper, B., Böhm, A.: Sassy: A Language and Optimizing Compiler for Image Processing on Reconfigurable Computing Systems. In: Christensen, H.I. (ed.) ICVS 1999. LNCS, vol. 1542, pp. 83–97. Springer, Heidelberg (1999)CrossRefGoogle Scholar
  80. 80.
    Najjar, W., Böhm, W., Draper, B., Hammes, J., et al.: High-level Language Abstraction for Reconfigurable Computing. IEEE Computer 36, 63–69 (2003)CrossRefGoogle Scholar
  81. 81.
    Bernecky, R.: The Role of APL and J in High-Performance Computation. APL Quote Quad. 24, 17–32 (1993)CrossRefGoogle Scholar
  82. 82.
    van der Walt, S., Colbert, S., Varoquaux, G.: The numpy array: A structure for efficient numerical computation. Computing in Science & Engineering 13 (2011)Google Scholar
  83. 83.
    Kristensen, M., Vinter, B.: Numerical Python for scalable architectures. In: 4th Conference on Partitioned Global Address Space Programming Model (PGAS 2010). ACM Press, New York (2010)Google Scholar
  84. 84.
    Scholz, S.B.: Single Assignment C – Functional Programming Using Imperative Style. In: 6th International Workshop on Implementation of Functional Languages (IFL 1994), pp. 21.1–21.13. University of East Anglia, Norwich (1994)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Clemens Grelck
    • 1
  1. 1.Institute of InformaticsUniversity of AmsterdamAmsterdamThe Netherlands

Personalised recommendations