Skip to main content
Log in

Design Flow for GPU and Multicore Execution of Dynamic Dataflow Programs

  • Published:
Journal of Signal Processing Systems Aims and scope Submit manuscript

Abstract

Dataflow programming has received increasing attention in the age of multicore and heterogeneous computing. Modular and concurrent dataflow program descriptions enable highly automated approaches for design space exploration, optimization and deployment of applications. A great advance in dataflow programming has been the recent introduction of the RVC-CAL language. Having been standardized by the ISO, the RVC-CAL dataflow language provides a solid basis for the development of tools, design methodologies and design flows. This paper proposes a novel design flow for mapping RVC-CAL dataflow programs to parallel and heterogeneous execution platforms. Through the proposed design flow the programmer can describe an application in the RVC-CAL language and map it to multi- and many-core platforms, as well as GPUs, for efficient execution. The functionality and efficiency of the proposed approach is demonstrated by a parallel implementation of a video processing application and a run-time reconfigurable filter for telecommunications. Experiments are performed on GPU and multicore platforms with up to 16 cores, and the results show that for high-performance applications the proposed design flow provides up to 4 × higher throughput than the state-of-the-art approach in multicore execution of RVC-CAL programs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9
Figure 10

Similar content being viewed by others

Notes

  1. https://github.com/orcc, http://www.dal.ethz.ch/

References

  1. Abdelaziz, M., Ghazi, A., Anttila, L., Boutellier, J., Lähteensuo, T., Lu, X., Cavallaro, J. R., Bhattacharyya, S. S., Juntti, M., & Valkama, M. (2013). Mobile transmitter digital predistortion: Feasibility analysis, algorithms and design exploration. In Proceedings of Asilomar Conference on Signals, Systems and Computers (pp. 2046–2053).

  2. Amer, I., Lucarz, C., Roquier, G., Mattavelli, M., Raulet, M., Nezan, J. -F., & Déforges, O. (2009). Reconfigurable video coding on multicore. IEEE Signal Processing Magazine, 26(6), 113–123.

    Article  Google Scholar 

  3. Bezati, E., Casale Brunet, S., Mattavelli, M., & Janneck, J. (2013). Synthesis and optimization of high-level stream programs. In Proceedings of Electronic System Level Synthesis Conference (pp. 1–6).

  4. Bezati, E., Thavot, R., Roquier, G., & Mattavelli, M. (2014). High-level dataflow design of signal processing systems for reconfigurable and multicore heterogeneous platforms. Journal of Real-Time Image Processing, 9(1), 251–262.

    Article  Google Scholar 

  5. Bilsen, G., Engels, M., Lauwereins, R., & Peperstraete, J. (1996). Cycle-static dataflow. IEEE Transactions on signal processing, 44(2), 397–408.

    Article  Google Scholar 

  6. Boutellier, J., & Ghazi, A. (2015). Multicore execution of dynamic dataflow programs on the Distributed Application Layer. In IEEE Global Conference on Signal and Information Processing (GlobalSIP) (pp. 893–897).

  7. Boutellier, J., Martin Gomez, V., Lucarz, C., Silvén, S., & Mattavelli, M. (2009). Multiprocessor scheduling of dataflow models within the reconfigurable video coding framework. In Proceedings of Conference on Design and Architectures for Signal and Image Processing.

  8. Boutellier, J., & Nyländen, T. (2015). Programming graphics processing units in the RVC-CAL dataflow language. In Proceedings of IEEE Workshop on Signal Processing Systems (pp. 1–6).

  9. Chavarrias, M., Pescador, F., Garrido, M. J., Juarez, E., & Sanz, C. (2015). A multicore DSP HEVC decoder using an actor-based dataflow model. In Proceedings of IEEE International Conference on Consumer Electronics (pp. 370–371).

  10. Chavarrias, M., Pescador, F., Juarez, E., & Garrido, M. J. (2014). An automatic tool for the static distribution of actors in RVC-CAL based multicore designs. In Proceedings of Conference on Design of Circuits and Integrated Circuits (pp. 1–6).

  11. Eker, J., & Janneck, J. W. (2003). CAL language report. UC, Berkeley: Technical Report UCB/ERL M03/48.

    Google Scholar 

  12. Gaster, B., Howes, L., Kaeli, D. R., Mistry, P., & Schaa, D. (2012). Heterogeneous Computing with OpenCL: Revised OpenCL 1.2 Edition. Morgan Kaufmann.

  13. Gautier, T., Lima, J. V. F., Maillard, N., & Raffin, B. (2013). XKaapi: A runtime system for data-flow task programming on heterogeneous architectures. In Proceedings of IEEE International Symposium on Parallel Distributed Processing (pp. 1299–1308).

  14. Gebrewahid, E., Yang, M., Cedersjö, G., Abdin, Z. U., Gaspes, V., Janneck, J. W., & Svensson, B. (2014). Realizing efficient execution of dataflow actors on manycores. In Proceedings of IEEE International Conference on Embedded and Ubiquitous Computing (pp. 321–328).

  15. Gorin, J., Yviquel, H., Prêteux, F., & Raulet, M. (2011). Just-in-time adaptive decoder engine: A universal video decoder based on MPEG RVC. In Proceedings of ACM International Conference on Multimedia (pp. 711–714).

  16. Hoshino, T., Maruyama, N., Matsuoka, S., & Takaki, R. (2013). CUDA vs OpenACC: Performance case studies with kernel benchmarks and a memory-bound CFD application. In IEEE/ACM International Symposium on Cluster Cloud and Grid Computing (CCGrid) (pp. 136–143).

  17. Kahn, G. (1974). The semantics of a simple language for parallel programming. In Rosenfeld, J. L. (Ed.) Information Processing (pp. 471–475). Stockholm, Sweden. North Holland, Amsterdam.

  18. Lee, E. A., & Messerschmitt, D. G. (1987). Synchronous data flow. Proceedings of the IEEE, 75(9), 1235–1245.

    Article  Google Scholar 

  19. Lee, E. A., & Parks, T. M. (1995). Dataflow process networks. Proceedings of the IEEE, 83(5), 773–801.

    Article  Google Scholar 

  20. Lucarz, C., Roquier, G., & Mattavelli, M. (2010). High level design space exploration of RVC codec specifications for multi-core heterogeneous platforms. In Proceedings of Conference on Design and Architectures for Signal and Image Processing (pp. 191–198).

  21. Lund, W., Kanur, S., Ersfolk, J., Tsiopoulos, L., Lilius, J., Haldin, J., & Falk, U. (2015). Execution of dataflow process networks on OpenCL platforms. In In Euromicro International Conference on Parallel, Distributed and Network-Based Pro- cessing (pp. 618–625).

  22. Mattavelli, M., Amer, I., & Raulet, M. (2010). The Reconfigurable Video Coding standard [standards in a nutshell]. IEEE Signal Processing Magazine, 27(3), 159–167.

    Article  Google Scholar 

  23. Sbîrlea, A., Zou, Y., Budimlíc, Z., Cong, J., & Sarkar, V. (2012). Mapping a data-flow programming model onto heterogeneous platforms. In Proceedings of ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, Tools and Theory for Embedded Systems (pp. 61–70).

  24. Scherer, T. (2013). Executing process networks on heterogeneous platforms using OpenCL. Master’s thesis, ETH Zürich.

  25. Schor, L., Bacivarov, I., Rai, D., Yang, H., Kang, S. -H., & Thiele, L. (2012). Scenario-based design flow for mapping streaming applications onto on-chip many-core systems. In Proceedings International Conference on Compilers, Architectures and Synthesis for Embedded Systems (pp. 71–80).

  26. Schor, L., Tretter, A., Scherer, T., & Thiele, L. (2013). Exploiting the parallelism of heterogeneous systems using dataflow graphs on top of OpenCL. In IEEE Symposium on Embedded Systems for Real-time Multimedia (pp. 41–50).

  27. Tretter, A., Boutellier, J., Guthrie, J., Schor, L., & Thiele, L. (2015). Executing dataflow actors as Kahn processes. In International Conference on Embedded Software (EmSoft) (pp. 105–114).

  28. Wipliez, M., Roquier, G., & Nezan, J. -F. (2011). Software code generation for the RVC-CAL language. Journal of Signal Processing Systems, 63(2), 203–213.

    Article  Google Scholar 

  29. Yviquel, H., Casseau, E., Raulet, M., Jääskelainen, P., & Takala, J. (2013a). Towards run-time actor mapping of dynamic dataflow programs onto multi-core platforms. In Proceedings of International Symposium on Image and Signal Processing and Analysis (pp. 732–737).

  30. Yviquel, H., Casseau, E., Wipliez, M., & Raulet, M. (2011). Efficient multicore scheduling of dataflow process networks. In Proceedings of IEEE Workshop on Signal Processing Systems (pp. 198–203).

  31. Yviquel, H., Lorence, A., Jerbi, K., Cocherel, G., Sanchez, A., & Raulet, M. (2013b). Orcc: Multimedia development made easy. In Proceedings of ACM International Conference on Multimedia (pp. 863–866).

  32. Yviquel, H., Sanchez, A., Jääskeläinen, P., Takala, J., Raulet, M., & Casseau, E. (2015). Embedded multi-core systems dedicated to dynamic dataflow programs. Journal of Signal Processing Systems, 80 (1), 121–136.

    Article  Google Scholar 

Download references

Acknowledgments

The authors thank anonymous reviewers for their constructive comments. This work was funded by the Academy of Finland project UNICODE.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to J. Boutellier.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Boutellier, J., Nyländen, T. Design Flow for GPU and Multicore Execution of Dynamic Dataflow Programs. J Sign Process Syst 89, 469–478 (2017). https://doi.org/10.1007/s11265-017-1260-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11265-017-1260-8

Keywords

Navigation