Throughput-Driven Parallel Embedded Software Synthesis from Synchronous Dataflow Models: Caveats and Remedies

  • Matin HashemiEmail author
  • Kamyar Mirzazad Barijough
  • Soheil Ghiasi


Synchronous dataflow (SDF) graphs are often the computational model of choice for specification, analysis, and automated synthesis of parallel streaming kernels targeting embedded multiprocessor system-on-a-chip (MPSoC) platforms. We discuss several limitations of the SDF graphs in the context of conventional parallel software synthesis methodologies, and highlight the associated degradation in analysis accuracy and performance of the synthesized software. Subsequently, we propose several extensions to the strict notion of SDF graph model that address the identified issues. We present extensive empirical evaluations, which underscore the model limitations and the effectiveness of our approach.


Buffer Size Advanced Encryption Standard LDPC Code Task Graph Virtual Channel 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    M. Ade, R. Lauwereins, J. Peperstraete, Data memory minimisation for synchronous data flow graphs emulated on DSP-FPGA targets, in Design Automation Conference, 1997Google Scholar
  2. 2.
    M.A. Bamakhrama, T.P. Stefanov, On the hard-real-time scheduling of embedded streaming applications. Des. Autom. Embed. Syst. Springer Netherlands, 17 (2), 221–249 (2012)CrossRefGoogle Scholar
  3. 3.
    K.M. Barijough, M. Hashemi, V. Khibin, S. Ghiasi, Implementation-aware model analysis: the case of buffer-throughput tradeoff in streaming applications, in Proceedings of the Conference on Languages, Compilers, and Tools for Embedded Systems, 2015, p. 11Google Scholar
  4. 4.
    S.S. Battacharyya, E.A. Lee, P.K. Murthy, Software Synthesis from Dataflow Graphs (Kluwer, Boston, 1996)CrossRefzbMATHGoogle Scholar
  5. 5.
    S. Bell et al., Tile64 - processor: a 64-core soc with mesh interconnect, in International Solid-State Circuits Conference, 2008Google Scholar
  6. 6.
  7. 7.
    B. Bhattacharya, S. Bhattacharyya, Parameterized dataflow modeling for DSP systems. IEEE Trans. Signal Process. 49 (10), 2408–2421 (2001)MathSciNetCrossRefGoogle Scholar
  8. 8.
    S.S. Bhattacharyya, P.K. Murthy, E.A. Lee, Software Synthesis from Dataflow Graphs (Springer, Berlin, 1996)CrossRefzbMATHGoogle Scholar
  9. 9.
    J.A. Cataldo, The power of higher-order composition languages in system design. Ph.D. thesis, University of California, Berkeley, 2006Google Scholar
  10. 10.
    J.-L. Colaço, A. Girault, G. Hamon, M. Pouzet, Towards a higher-order synchronous data-flow language, in International Conference on Embedded Software, 2004, pp. 230–239Google Scholar
  11. 11.
    M.H. Foroozannejad, M. Hashemi, T.L. Hodges, S. Ghiasi, Look into details: the benefits of fine-grain streaming buffer analysis, in Proceedings of the Conference on Languages, Compilers, and Tools for Embedded Systems, 2010, pp. 27–36Google Scholar
  12. 12.
    M.H. Foroozannejad, T. Hodges, M. Hashemi, S. Ghiasi, Postscheduling buffer management trade-offs in streaming software synthesis. ACM Trans. Des. Autom. Electron. Syst. 17 (3), 27 (2012)Google Scholar
  13. 13.
    M.H. Foroozannejad, M. Hashemi, A. Mahini, B.M. Baas, S. Ghiasi, Time-scalable mapping for circuit-switched gals chip multiprocessor platforms. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 33 (5), 752–762 (2014)CrossRefGoogle Scholar
  14. 14.
    P. Fradet, A. Girault, P. Poplavko, A schedulable parametric data-flow MoC, in Proceedings of the Conference on Design Automation and Test in Europe, 2012Google Scholar
  15. 15.
    M. Geilen, Reduction techniques for synchronous dataflow graphs, in Design Automation Conference, 2009Google Scholar
  16. 16.
    A.H. Ghamarian et al., Throughput analysis of synchronous data flow graphs, in International Conference on Application of Concurrency to System Design, 2006Google Scholar
  17. 17.
    M. Gordon, Compiler techniques for scalable performance of stream programs on multicore architectures. Ph.D. thesis, Massachusetts Institute of Technology, 2010Google Scholar
  18. 18.
  19. 19.
    M. Hashemi, Automated software synthesis for streaming applications on embedded manycore processors. PhD thesis, University of California, Davis, 2011Google Scholar
  20. 20.
    M. Hashemi, S. Ghiasi, Exact and approximate task assignment algorithms for pipelined software synthesis, in Proceedings of the Conference on Design Automation and Test in Europe, 2008, pp. 746–751Google Scholar
  21. 21.
    M. Hashemi, S. Ghiasi, Throughput-driven synthesis of embedded software for pipelined execution on multicore architectures. ACM Trans. Embed. Comput. Syst. 8, 11 (2009)CrossRefGoogle Scholar
  22. 22.
    M. Hashemi, S. Ghiasi, Versatile task assignment for heterogeneous soft dual-processor platforms. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 29 (3) (2010)Google Scholar
  23. 23.
    M. Hashemi, M.H. Foroozannejad, S. Ghiasi, C. Etzel, Formless: Scalable utilization of embedded manycores in streaming applications, in Proceedings of the Conference on Languages, Compilers, and Tools for Embedded Systems, 2012, pp. 71–78Google Scholar
  24. 24.
    M. Hashemi, M.H. Foroozannejad, S. Ghiasi, Throughput-memory footprint trade-off in synthesis of streaming software on embedded multiprocessors. ACM Trans. Embed. Comput. Syst. 13 (3) (2013)Google Scholar
  25. 25.
    P.-K. Huang, M. Hashemi, S. Ghiasi, System-level performance estimation for application-specific MPSoC interconnect synthesis, in Proceedings of the 2008 Symposium on Application Specific Processors, 2008, pp. 95–100Google Scholar
  26. 26.
    G. Karypis, V. Kumar, METIS 4.0: unstructured graph partitioning and sparse matrix ordering system. Technical Report, Department of Computer Science. University of Minnesota, Minneapolis, 1998Google Scholar
  27. 27.
    E.A. Lee, D.G. Messerschmitt, Static scheduling of synchronous data flow programs for digital signal processing. IEEE Trans. Comput. 36, 24–35 (1987)CrossRefGoogle Scholar
  28. 28.
    E.A. Lee, D.G. Messerschmitt, Synchronous data flow. Proc. IEEE 75 (9), 1235–1245 (1987)CrossRefGoogle Scholar
  29. 29.
    T. Mohsenin, D. Truong, B. Baas, Multi-split-row threshold decoding implementations for LDPC codes, in International Symposium on Circuits and Systems, 2009Google Scholar
  30. 30.
    A. Moonen et al., Practical and accurate throughput analysis with the cyclo static dataflow model, in International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems, 2007Google Scholar
  31. 31.
    O.M. Moreira, M.J. Bekooij, Self-timed scheduling analysis for real-time applications. EURASIP J. Adv. Signal Process. 2007, 14 (2007)CrossRefzbMATHGoogle Scholar
  32. 32.
    J. Nickolls et al., Scalable parallel programming with CUDA. ACM Queue 6, 40–53 (2008)CrossRefGoogle Scholar
  33. 33.
    H. Oh, S. Ha, Fractional rate dataflow model for efficient code synthesis. J. VLSI Signal Process. Syst. Signal Image Video Technol. 37 (1), 41–51 (2004)CrossRefGoogle Scholar
  34. 34.
    J.D. Owens, U.J. Kapasi, P. Mattson, B. Towles, B. Serebrin, S. Rixner, W.J. Dally, Media processing applications on the imagine stream processor, in International Conference on Computer Design, 2002, pp. 295–302.Google Scholar
  35. 35.
    K. Parhi, VLSI Digital Signal Processing Systems: Design and Implementation (Wiley, New York, 2008)Google Scholar
  36. 36.
    A. Pinto, A. Bonivento, A.L. Sangiovanni-Vincentelli, R. Passerone, M. Sgroi, System level design paradigms: Platform-based design and communication synthesis. ACM Trans. Des. Autom. Electron. Syst. 11 (3), 537–563 (2006)CrossRefGoogle Scholar
  37. 37.
    A. Sangiovanni-Vincentelli, G. Martin, A vision for embedded systems: platform-based design and software methodology. Des. Test Comput. 18 (6), 23–33 (2001)CrossRefGoogle Scholar
  38. 38.
    A. Sangiovanni-Vincentelli, L. Carloni, F. De Bernardinis, M. Sgroi, Benefits and challenges for platform-based design, in Design Automation Conference, 2004. Proceedings. 41st, 2004, pp. 409–414Google Scholar
  39. 39.
  40. 40.
    S. Stuijk, Predictable mapping of streaming applications on multiprocessors. Ph.D. thesis, Eindhoven University of Technology, The Netherlands, 2007Google Scholar
  41. 41.
    S. Stuijk et al., Exploring trade-offs in buffer requirements and throughput constraints for synchronous dataflow graphs, in Design Automation Conference, 2006Google Scholar
  42. 42.
    S. Stuijk, M. Geilen, T. Basten, Throughput-buffering trade-off exploration for cyclo-static and synchronous dataflow graphs. IEEE Trans. Comput. 57 (10), (2008)Google Scholar
  43. 43.
    W. Taha, A gentle introduction to multi-stage programming. Domain-Specific Program Generation (Springer, Berlin, 2003), pp. 30–50Google Scholar
  44. 44.
    B. Theelen et al., A scenario-aware data flow model for combined long-run average and worst-case performance analysis, in Proceedings of the International Conference on Formal Methods and Models in CoDesign, 2006 Google Scholar
  45. 45.
    W. Thies, Language and compiler support for stream programs. Ph.D. thesis, Massachusetts Institute of Technology, 2009Google Scholar
  46. 46.
    W. Thies et al., Streamit: a language for streaming applications, in International Conference on Compiler Construction, 2002Google Scholar
  47. 47.
    D. Truong et al., A 167-processor 65 nm computational platform with per-processor dynamic supply voltage and dynamic clock frequency scaling, in IEEE Symposium on VLSI Circuits, 2008Google Scholar
  48. 48.
    M.H. Wiggers, M.J. Bekooij, G.J. Smit, Buffer capacity computation for throughput constrained streaming applications with data-dependent inter-task communication, in IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS), 2008Google Scholar
  49. 49.
    Z. Xiao, B. Baas, 1080p h.264/avc baseline residual encoder for a fine-grained many-core system. IEEE Trans. Circuits Syst. Video Technol. 21, 890–902 (2011)Google Scholar
  50. 50.
    Y. Zhou, E.A. Lee, A causality interface for deadlock analysis in dataflow, in International Conference on Embedded Software, 2006, pp. 44–52Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Matin Hashemi
    • 1
    Email author
  • Kamyar Mirzazad Barijough
    • 1
  • Soheil Ghiasi
    • 2
  1. 1.Sharif University of TechnologyTehranIran
  2. 2.University of CaliforniaDavisUSA

Personalised recommendations