International Journal of Parallel Programming

, Volume 44, Issue 2, pp 233–256 | Cite as

SCnC: Efficient Unification of Streaming with Dynamic Task Parallelism

  • Dragoş SbîrleaEmail author
  • Jun Shirako
  • Ryan Newton
  • Vivek Sarkar


Stream processing is a special form of the dataflow execution model that offers extensive opportunities for optimization and automatic parallelization. To take full advantage of the paradigm programmers are typically required to learn a new language and re-implement their applications. This work shows that it is possible to exploit streaming as a safe and automatic optimization of a more general dataflow-based model—one in which computation kernels are written in standard, general-purpose languages and organized as a coordination graph. We propose streaming concurrent collections (SCnC), a streaming system that can efficiently run a subset of programs supported by concurrent collections (CnC). CnC is a general purpose parallel programming paradigm that integrates task parallelism and dataflow computing. The proposed streaming support allows application developers to reason about their program as a general dataflow graph, while benefiting from the performance and tight memory footprint of stream parallelism when their program satisfies streaming constraints. In this paper, we formally define the application requirements for using SCnC, and outline a static decision procedure for identifying and processing eligible SCnC subgraphs. We present initial results showing that transitioning from general CnC to SCnC leads to a throughput increase of up to 40\(\times \) for certain benchmarks, and also enables programs with large data sizes to execute in available memory for cases where CnC execution may run out of memory.


Streaming Task parallelism Dynamic parallelism  Dataflow 


  1. 1.
    Thies, W., Karczmarek, M., Amarasinghe, S.P.: Streamit: a language for streaming applications. In: CC ’02, pp. 179–196. Springer, LondonGoogle Scholar
  2. 2.
    Budimlic, Z., Burke, M., Cavé, V., Knobe, K., Lowney, G., Newton, R., Palsberg, J., Peixotto, D.M., Sarkar, V., Schlimbach, F., Tasirlar, S.: Concurrent collections. Sci. Program. 18(3–4), 203–207 (2010)Google Scholar
  3. 3.
    Blumofe, R.D., Leiserson, C.E.: Scheduling multithreaded computations by work stealing. J. ACM 46, 720–748 (1999)MathSciNetCrossRefzbMATHGoogle Scholar
  4. 4.
    Agarwal, S., Barik, R., Bonachea, D., Sarkar, V., Shyamasundar, R.K., Yelick, K.: Deadlock-free scheduling of X10 computations with bounded resources. In: SPAA ’07 ACM, New YorkGoogle Scholar
  5. 5.
    Guo, Y., Barik, R., Raman, R., Sarkar, V.: Work-first and help-first scheduling policies for async-finish task parallelism. In: IPDPS’09Google Scholar
  6. 6.
    MathWorks Symbolic Math Toolbox Documentation. Accessed Feb 2015
  7. 7.
    Li, P., Agrawal, K., Buhler, J., Chamberlain, R.D.: Deadlock avoidance for streaming computations with filtering. In: SPAA ’10Google Scholar
  8. 8.
    Li, P., Agrawal, K., Buhler, J., Chamberlain, R.D., Lancaster, J.M.: Deadlock-avoidance for streaming applications with split-join structure: two case studies. In: ASAP, pp. 333–336 (2010)Google Scholar
  9. 9.
    Soul, R., Gordon, M.I., Amarasinghe, S., Grimm, R., Hirzel, M.: Hitting the Sweet Spot for Streaming Languages. NY University CS Technical Report TR2012-948 (2009)Google Scholar
  10. 10.
    Cavé, V., Zhao, J., Shirako, J., Sarkar, V.: Habanero-java: the new adventures of old X10. In: Proceedings of the 9th International Conference on Principles and Practice of Programming in Java, PPPJ ’11 (2011)Google Scholar
  11. 11.
    Shirako, J., Peixotto, D.M., Sarkar, V., Scherer, W.N.: Phasers: a unified deadlock-free construct for collective and point-to-point synchronization. In: ICS ’08, pp. 277–288, ACM, New YorkGoogle Scholar
  12. 12.
    Shirako, J., Peixotto, D.M., Sarkar, V., Scherer, W.N.: Phaser accumulators: a new reduction construct. In: IPDPS 09Google Scholar
  13. 13.
    Georges, A., Buytaert, D., Eeckhout, L.: Statistically rigorous java performance evaluation. In: OOPSLA’07, pp. 57–76. ACMGoogle Scholar
  14. 14.
    Meyerson, A.: Online facility location. In: FOCS ’01Google Scholar
  15. 15.
    Canny, J.: A computational approach to edge detection. IEEE Trans. Pattern Anal. Mach. Intell. 8, 679–698 (1986)CrossRefGoogle Scholar
  16. 16.
    Nijhuis, M., Bos, H., Bal, H.E.:A component-based coordination language for efficient reconfigurable streaming applications. In: ICPP (2007)Google Scholar
  17. 17.
    Nijhuis, M.: Framework for parallel streaming applications. Ph.D. dissertation (2007)Google Scholar
  18. 18.
    Auerbach, J., Bacon, D.F., Cheng, P., Rabbah, R.: Lime: a java-compatible and synthesizable language for heterogeneous architectures. In: OOPSLA ’10, pp. 89–108, ACM, New YorkGoogle Scholar
  19. 19.
    Liao, S., Du, Z., Wu, G., Lueh, G.-Y.: Data and computation transformations for brook streaming applications on multiprocessors. In: CGO ’06, pp. 196–207, IEEE Computer Society, WashingtonGoogle Scholar
  20. 20.
    Buck, I., Foley, T., Horn, D., Sugerman, J., Fatahalian, K., Houston, M., Hanrahan, P.: Brook for gpus: stream computing on graphics hardware. In: SIGGRAPH ’04, pp. 777–786, ACM, New York (2004)Google Scholar
  21. 21.
    Aoyagi, Y., Uehara, M., Mori, H.: A case study on predictive method of task allocation in stream-based computing. In: Proceedings of the 13th International Conference on Information Networking, ICOIN ’98Google Scholar
  22. 22.
    Collins, R.L., Carloni, L.P.: Flexible filters: load balancing through backpressure for stream programs. In: EMSOFT ’09Google Scholar
  23. 23.
    Aleen, F., Sharif, M., Pande, S.: Input-driven dynamic execution prediction of streaming applications. In: PPoPP ’10, pp. 315–324Google Scholar
  24. 24.
    Miranda, C., Pop, A., Dumont, P., Cohen, A., Duranton, M.: Erbium: a deterministic, concurrent intermediate representation to map data-flow tasks to scalable, persistent streaming processes. In: CASES ’10, pp. 11–20. ACMGoogle Scholar
  25. 25.
    Vandierendonck, H., Tzenakis, G., Nikolopoulos, D.S.: A unified scheduler for recursive and task dataflow parallelism. In: PACT ’11Google Scholar
  26. 26.
    Pop, A., Cohen, A.: Openstream: expressiveness and data-flow compilation of openmp streaming programs. In: TACO ’13Google Scholar

Copyright information

© Springer Science+Business Media New York 2015

Authors and Affiliations

  • Dragoş Sbîrlea
    • 1
    Email author
  • Jun Shirako
    • 1
  • Ryan Newton
    • 2
  • Vivek Sarkar
    • 1
  1. 1.Rice UniversityHoustonUSA
  2. 2.Indiana UniversityBloomingtonUSA

Personalised recommendations