Advertisement

The Journal of Supercomputing

, Volume 61, Issue 1, pp 118–140 | Cite as

StreamPI: a stream-parallel programming extension for object-oriented programming languages

  • Jingun Hong
  • Kirak Hong
  • Bernd Burgstaller
  • Johann Blieberger
Article

Abstract

Because multicore CPUs have become the standard with all major hardware manufacturers, it becomes increasingly important for programming languages to provide programming abstractions that can be mapped effectively onto parallel architectures. Stream processing is a programming paradigm where computations are expressed as independent actors that communicate via FIFO data-channels. The coarse-grained parallelism exposed in stream programs facilitates such an efficient mapping of actors onto the underlying multicore hardware.

We propose a stream-parallel programming abstraction that extends object-oriented languages with stream-programming facilities. StreamPI consists of a class hierarchy for actor-specification together with a language-independent runtime system that supports the execution of stream programs on multicore architectures. We show that the language-specific part of StreamPI, i.e., the class hierarchy, can be implemented as a library-level programming language extension. A library-level extension has the advantage that an existing programming language implementation need not be touched. Legacy-code can be mixed with a stream-parallel application, and the use of sequential legacy code with actors is supported. Unlike previous approaches, StreamPI allows dynamic creation and subsequent execution of stream programs. StreamPI actors are typed. Type-safety is achieved through type-checks at stream graph creation time.

We have implemented StreamPI’s language-independent runtime system and language interfaces for Ada 2005 and C++ for Intel multicore architectures. We have evaluated StreamPI for up to 16 cores on a two CPU 8-core Intel Xeon X7560 server, and we provide a performance comparison with StreamIt (Gordon et al. in International Conference on Architectural Support for Programming Languages and Operating Systems, 2006), which is the de facto standard for stream-parallel programming. Although our approach provides greater programming flexibility than StreamIt, the performance of StreamPI compares favorably to the static compilation model of StreamIt.

Keywords

Programming language support for multicore architectures Stream-parallel programming abstraction Synchronous data-flow Multicore architectures 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Amarasinghe S, Gordon MI, Karczmarek M, Lin J, Maze D, Rabbah RM, Thies W (2005) Language and compiler design for streaming applications. Int J Parallel Program 33(2):261–278 CrossRefGoogle Scholar
  2. 2.
    Andrews J, Baker N (2006) Xbox 360 system architecture. IEEE MICRO 26(2):25–37 CrossRefGoogle Scholar
  3. 3.
    Battacharyya SS, Lee EA, Murthy PK (1996) Software synthesis from dataflow graphs. Kluwer Academic, Norwell zbMATHCrossRefGoogle Scholar
  4. 4.
    Belina F, Hogrefe D (1989) The CCITT-specification and description language SDL. Comput Netw 16:311–341 Google Scholar
  5. 5.
    Berry G, Gonthier G (1992) The Esterel synchronous programming language: design, semantics, implementation. Sci Comput Program 19(2):87–152 zbMATHCrossRefGoogle Scholar
  6. 6.
    Bryant RE, O’Halloran DR (2003) Computer systems: a programmer’s perspective. Prentice-Hall, New York Google Scholar
  7. 7.
    Buttlar D, Farrell J, Nichols B (1996) PThreads programming. O’Reilly, Sebastopol Google Scholar
  8. 8.
    Carpenter PM, Ramirez A, Ayguade E (2009) Mapping stream programs onto heterogeneous multiprocessor systems. In: CASES ’09: proceedings of the 2009 international conference on compilers, architecture, and synthesis for embedded systems. ACM Press, New York, pp 57–66 CrossRefGoogle Scholar
  9. 9.
    Caspi P, Pilaud D, Halbwachs N, Plaice J (1987) Lustre: a declarative language for programming synchronous systems. In: Proceedings of the 14th ACM conference on principles of programming languages, pp 178–188 Google Scholar
  10. 10.
    Chen MK, Li XF, Lian R, Lin JH, Liu L, Liu T, Ju R (2005) Shangri-la: Achieving high performance from compiled network applications while enabling ease of programming. In: PLDI ’05: proceedings of the 2005 ACM SIGPLAN conference on programming language design and implementation. ACM Press, New York Google Scholar
  11. 11.
    Farhad SM, Ko Y, Burgstaller B, Scholz B (2011) Orchestration by approximation: mapping stream programs onto multicore architectures. In: Proceedings of the sixteenth international conference on architectural support for programming languages and operating systems, ASPLOS ’11, New York, NY, USA. ACM Press, New York, pp 357–368 CrossRefGoogle Scholar
  12. 12.
    Google (2009) The Go programming language specification, retrieved Nov 2009. http://golang.org
  13. 13.
    Gordon MI, Thies W, Amarasinghe S (2006) Exploiting coarse-grained task, data, and pipeline parallelism in stream programs. In: International conference on architectural support for programming languages and operating systems, San Jose, CA Google Scholar
  14. 14.
    Gummaraju J, Rosenblum M (2005) Stream programming on general-purpose processors. In: MICRO 38: proceedings of the 38th annual IEEE/ACM international symposium on microarchitecture. IEEE Computer Society Press, Los Alamitos, pp 343–354 Google Scholar
  15. 15.
    Gupta R, Hill CR (1990) A scalable implementation of barrier synchronization using an adaptive combining tree. Int J Parallel Program 18:161–180 CrossRefGoogle Scholar
  16. 16.
    Hagiescu A, Wong W, Bacon DF, Rabbah R (2009) A computing origami: folding streams in FPGAs. In: DAC ’09: proceedings of the 2009 design automation conference. ACM Press, New York Google Scholar
  17. 17.
    Herlihy M, Shavit N (2008) The art of multiprocessor programming. Morgan Kaufmann, San Mateo Google Scholar
  18. 18.
    Hofstee HP (2005) Power efficient processor architecture and the Cell processor. In: HPCA ’05: proceedings of the 2005 international symposium on high-performance computer architecture. IEEE Computer Society Press, Los Alamitos, pp 258–262 Google Scholar
  19. 19.
    Hormati AH, Choi Y, Kudlur M, Rabbah R, Mudge T, Mahlke S (2009) Flextream: Adaptive compilation of streaming applications for heterogeneous architectures. In: PACT ’09: proceedings of the 2009 18th international conference on parallel architectures and compilation techniques, Washington, DC, USA. IEEE Computer Society Press, Los Alamitos, pp 214–223 Google Scholar
  20. 20.
    IBM Redbooks (2008) Programming the cell broadband engine architecture: examples and best practices. http://www.redbooks.ibm.com
  21. 21.
    IDC (2008) PC semiconductor market briefing: re-architecting the PC and the migration of value, June 2008, http://www.idc.com
  22. 22.
    ISO/IEC 8652:2007 (2006) Ada reference manual, 3rd edn Google Scholar
  23. 23.
    Kahn G (1974) The semantics of a simple language for parallel programming. In: Rosenfeld JL (ed) Information processing, Stockholm, Sweden, Aug. North Holland, Amsterdam, pp 471–475 Google Scholar
  24. 24.
    Kapasi UJ, Dally WJ, Rixner S, Owens JD, Khailany B (2002) The imagine stream processor. In: Computer design, international conference on, p 282 Google Scholar
  25. 25.
    Karczmarek M (2002) Constrained and phased scheduling of synchronous data flow graphs for the StreamIt language. Master’s thesis, Massachusetts Institute of Technology Google Scholar
  26. 26.
    Karczmarek M, Thies W, Amarasinghe S (2003) Phased scheduling of stream programs. ACM SIGPLAN Not 38(7):103–112 CrossRefGoogle Scholar
  27. 27.
    Karczmarek M, Thies W, Amarasinghe S (2003) Phased scheduling of stream programs. In: LCTES ’03: proceedings of the 2003 ACM SIGPLAN/SIGBED conference on languages, compilers, and tools for embedded systems, vol 38, pp 1235–1245 Google Scholar
  28. 28.
    Karp RM, Miller RE (1966) Properties of a model for parallel computations: determinacy, termination, queueing. SIAM J Appl Math 14(6):1390–1411 MathSciNetzbMATHCrossRefGoogle Scholar
  29. 29.
    Kudlur M, Mahlke S (2008) Orchestrating the execution of stream programs on multicore platforms. In: PLDI ’08: proceedings of the 2008 ACM SIGPLAN conference on programming language design and implementation. ACM Press, New York Google Scholar
  30. 30.
    Lee EA, Messerschmitt DG (1987) Static scheduling of synchronous data flow programs for digital signal processing. IEEE Trans Comput 36(1):24–35 zbMATHCrossRefGoogle Scholar
  31. 31.
    Lee EA, Messerschmitt DG (1987) Synchronous data flow. Proc IEEE 75(9):1235–1245 CrossRefGoogle Scholar
  32. 32.
    Leung M-K, Liu I, Zou J (2008) Code generation for process network models onto parallel architectures. Technical Report UCB/EECS-2008-139, EECS Department, University of California, Berkeley Google Scholar
  33. 33.
    Lin C, Snyder L (2008) Principles of parallel programming. Addison-Wesley, Reading Google Scholar
  34. 34.
    Lin Y, Choi Y, Mahlke SA, Mudge TN, Chakrabarti C (2008) A parameterized dataflow language extension for embedded streaming systems. In: SAMOS ’08: proceedings of the 2008 international conference on embedded computer systems: architectures, modeling, and simulation, pp 10–17 CrossRefGoogle Scholar
  35. 35.
    Mattson TG, Sanders BA, Massingill BL (2007) Patterns for parallel programming, 3rd edn. Addison-Wesley, Reading Google Scholar
  36. 36.
    Michael EW, Taylor M, Sarkar V, Lee W, Lee V, Kim J, Frank M, Finch P, Devabhaktuni S, Barua R, Babb J, Amarasinghe S, Agarwal A (1997) Baring it all to software: the Raw machine. Computer 30:86–93 Google Scholar
  37. 37.
    Pacheco PS (1996) Parallel programming with MPI. Morgan Kaufmann, San Francisco Google Scholar
  38. 38.
    Reinders J (2007) Intel threading building blocks. O’Reilly, Sebastopol Google Scholar
  39. 39.
    Sedgewick R (2002) Algorithms in C++, 3rd edn. Addison-Wesley-Longman, Reading Google Scholar
  40. 40.
    Sermulins J, Thies W, Rabbah R, Amarasinghe S (2005) Cache aware optimization of stream programs. In: LCTES ’05: proceedings of the 2005 ACM SIGPLAN/SIGBED conference on languages, compilers, and tools for embedded systems. ACM Press, New York, pp 115–126 CrossRefGoogle Scholar
  41. 41.
    Spring JH, Privat J, Guerraoui R, Vitek J (2007) StreamFlex: High-throughput stream programming in Java. In: OOPSLA ’07: proceedings of the 2007 ACM SIGPLAN conference on object-oriented programming systems and applications Google Scholar
  42. 42.
    Stephens R (1997) A survey of stream processing. Acta Inform 34:491–541 MathSciNetzbMATHCrossRefGoogle Scholar
  43. 43.
    StreamIt research group (2006) StreamIt Cookbook. Online reference manual. Massachusetts Institute of Technology Google Scholar
  44. 44.
    StreamIt Web Site (2010) http://groups.csail.mit.edu/cag/streamit/, retrieved Dec 2010
  45. 45.
    Thies W (2009) Language and compiler support for stream programs. PhD thesis, Massachusetts Institute of Technology Google Scholar
  46. 46.
    Thies W, Amarasinghe S (2010) An empirical characterization of stream programs and its implications for language and compiler design. In: PACT ’10 proceedings of the 2010 conference on parallel architectures and compilation techniques. ACM Press, New York Google Scholar
  47. 47.
    Thies W, Karczmarek M, Amarasinghe SP (2002) StreamIt: A Language for Streaming Applications. In: CC ’02: proceedings of the 11th international conference on compiler construction, London, UK, LNCS. Springer, Berlin, pp 179–196 Google Scholar
  48. 48.
    Udupa A, Govindarajan R, Thazhuthaveetil MJ (2009) Software pipelined execution of stream programs on GPUs. In: CGO ’09: proceedings of the 7th Annual IEEE/ACM international symposium on code generation and optimization. IEEE Computer Society Press, Los Alamitos Google Scholar
  49. 49.
    Udupa A, Govindarajan R, Thazhuthaveetil MJ (2009) Synergistic execution of stream programs on multicores with accelerators. In: LCTES ’09: proceedings of the 2009 ACM SIGPLAN/SIGBED conference on languages, compilers, and tools for embedded systems Google Scholar
  50. 50.
    Wei H, Yu J, Yu H, Gao GR (2010) Minimizing communication in rate-optimal software pipelining for stream programs. In: CGO ’10: proceedings of the 8th annual IEEE/ACM international symposium on code generation and optimization. ACM Press, New York, pp 210–217 CrossRefGoogle Scholar
  51. 51.
    Zhang D, Li QJ, Rabbah R, Amarasinghe S (2008) A lightweight streaming layer for multicore execution. SIGARCH Comput Archit News 36(2):18–27 CrossRefGoogle Scholar
  52. 52.
    Zhang D, Li Z, Song H, Liu L (2005) A programming model for an embedded media processing architecture. In: SAMOS ’05: proceedings of the 2005 international conference on embedded computer systems: architectures, modeling, and simulation, LNCS. Springer, Berlin Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2011

Authors and Affiliations

  • Jingun Hong
    • 1
  • Kirak Hong
    • 1
  • Bernd Burgstaller
    • 1
  • Johann Blieberger
    • 2
  1. 1.Yonsei UniversitySeoulKorea
  2. 2.Vienna University of TechnologyViennaAustria

Personalised recommendations