Skip to main content

Advertisement

Log in

Pipeline Synthesis and Optimization from Branched Feedback Dataflow Programs

  • Published:
Journal of Signal Processing Systems Aims and scope Submit manuscript

Abstract

Large dataflow designs are a result of behavioral specification of modern complex digital systems and/or a result of unfolding and transforming looped and branched programs. Since deep-submicron silicon technology provides large amounts of available resources, pipelining optimization without (or with minimal) resource sharing can give significant advantages in performance. High-level synthesis of CAL-programs is particularly popular in computation intensive applications (e.g., image and video processing, cryptography, wireless communication, etc.) where feedback actors with data flows at input and output ports represent loop-like behavior. In this work, we propose techniques for transforming, analysis, speculatively pipelining and optimizing large branched feedback dataflow programs. We develop an accurate algorithm and introduce fast dynamic and mixed static / dynamic heuristics that firstly minimize the number of pipeline stages for a given pipeline-stage time-period, and secondly minimize the overall pipeline registers size by means of appropriate assignment of feedbacks and instructions to pipeline stages. We also propose a genetic algorithm for tuning the heuristics for a particular design. The experimental results show the algorithms we propose give quickly solutions that are very close to accurate solutions and overcomes the earlier developed algorithms regarding computing time and pipeline parameters.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9
Figure 10
Figure 11
Figure 12
Figure 13

Similar content being viewed by others

References

  1. Park, N., & Parker, A. C. (March 1988). Sehwa: A software package for synthesis of pipelines from behavioral specifications. IEEE Transactions on CAD of ICs, 7, 356–370.

    Article  Google Scholar 

  2. K. S. Hwang, A. E. Casavant, C.-T. Chang, and M. A. d’Abreu, “Scheduling and hardware sharing in pipelined data paths,” in Proc. ICCAD-89, November 1989, pp. 24–27.

  3. E. M. Girczyc, “Loop winding - a data flow approach to functional pipelining,” in Proc. of the IEEE ISCAS, May 1987, pp. 382–385.

  4. Hwang, C.-T., Hsu, Y.-C., & Lin, Y.-L. (September 1993). Pls: A scheduler for pipeline synthesis. Transactions Comparative-Aided Design Integrative Cirugia Systems, 12(9), 1279–1286.

    Article  Google Scholar 

  5. Jun, H.-S., & Hwang, S.-Y. (Sep 1994). Design of a pipelined datapath synthesis system for digital signal processing. IEEE Transactions VLSI Syst., 2(3), 292–303.

    Article  Google Scholar 

  6. Bakshi, S., & Gajski, D. D. (June 1996). Component selection for high-performance pipelines. IEEE Transactions VLSI System, 4(2), 181–194.

    Article  Google Scholar 

  7. A.-H. Ab Rahman, A. Prihozhy, and M. Mattavelli, “Pipeline synthesis and optimization of FPGA-based video processing applications with CAL,” EURASIP Journal on Image and Video Processing, vol. 2011:19, pp. 1–28, 2011. https://doi.org/10.1186/1687-5281-2011-19.

  8. Prihozhy, A., Bezati, E., Ab Rahman, A.-H., & Mattavelli, M. (2015). Synthesis and optimization of pipelines for HW implementations of dataflow programs. IEEE Transactions on CAD, 34(10), 1613–1626.

    Article  Google Scholar 

  9. A. Prihozhy, S. Casale-Brunet, E. Bezati and M. Mattavelli. “Efficient Dynamic Optimization Heuristics for Dataflow Pipelines,” IEEE International Workshop on Signal Processing Systems, IEEE, pp. 337–342, October 2018.

  10. J. Eker and J. Janneck, CAL language report: Specification of the CAL actor language. University of California-Berkeley, December 2003.

  11. Mattavelli, M., Amer, I., & Raulet, M. (2010). “The reconfigurable video coding standard” [standards in a nutshell]. Signal Processing Magazine, IEEE, 27(3), 159–167.

    Article  Google Scholar 

  12. Z. Zhang, B. Liu. “SDC-Based Modulo Scheduling for Pipeline Synthesis,” IEEE/ACM International Conference on Computer-Aided Design (ICCAD), pp. 211–218, November 2013.

  13. Weinhardt, M., & Luk, W. (Feb. 2001). Pipeline vectorization. Transactions Comparative-Aided Design Integrative Cirurgica System, 20(2), 234–248.

    Article  Google Scholar 

  14. G. Demicheli, “Hardware synthesis from C/C++ models,” in Design, Automation and Test in Europe Conference and Exhibition 1999, pp. 382–383.

  15. Prihozhy, A. (2001). High-level synthesis through transforming VHDL models (pp. 135–146). System-on-Chip Methodologies & Design Languages, Kluwer Academic Publishers, Springer: Boston, MA.

    Google Scholar 

  16. L.-F. Chao, A. LaPaugh, and E.-M. Sha, “Rotation scheduling: A loop pipelining algorithm,” Transactions Comparative-Aided Design Integrative Cirurgica System, vol. 16, no. 3, pp. 229–239, Mar 1997.

  17. Verhaegh, W. F. J., Lippens, P. E. R., Aarts, E. H. L., Korst, J. H. M., Van Meerbergen, J., & van der Werf, A. (1995). Improved force-directed scheduling in high-throughput digital signal processing. Transactions Comparative-Aided Design Integrative Cirurgica System, 14(8), 945–960.

    Article  Google Scholar 

  18. E. Nurvitadhi, J. Hoe, T. Kam, and S. Lu, “Automatic pipelining from transactional datapath specifications,” Transactions Comparative-Aided Design Integrative Cirurgica System, vol. 30, no. 3, pp. 441–454, 2011.

  19. Oh, S., Kim, T. G., Cho, J., & Bozorgzadeh, E. (2008). Speculative loop pipelining in binary translation for hardware acceleration. Transactions Comparative-Aided Design Integrative Cirurgica System, 27(3), 409–422.

    Article  Google Scholar 

  20. Serot, J., Berry, F., & Ahmed, S. (2011). Implementing Stream-Processing Applications on FPGAs: A DSL-Based Approach. 21st International Conference on Field Programmable Logic and Applications, Chania, 2011, 130–137.

    Google Scholar 

  21. Paulin, P. G., & Knight, J. P. (1989). Force-directed scheduling for the behavioral synthesis of ASIC’s. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 8(6), 661–679.

    Article  Google Scholar 

  22. Shenoy, N. (1997). Retiming: Theory and practice. VLSI J Integrative, 22(1–2), 1–21.

    Article  Google Scholar 

  23. R. Potasman, J. Lis, A. Aiken, and A. Nicolau, “Percolation based synthesis,” in Proc. 27th Design Automation Conf., 1990, pp. 444–449.

  24. Javaid, H., Ignjatovic, A., & Parameswaran, S. (2010). Rapid design space exploration of application specific heterogeneous pipelined multiprocessor systems. Transactions Comparative-Aided Integrative, 29(11), 1777–1789.

    Google Scholar 

  25. Rau, B. R., & Glaeser, C. D. (1981). Some scheduling techniques and an easily schedulable horizontal architecture for high performance scientific computing. ACM SIGMICRO Newsletter, 12(4), 183–198.

    Article  Google Scholar 

  26. J. Codina, J. Llosa, and A. Gonz’alez. “A Comparative Study of Modulo Scheduling Techniques,” Int’l Conf. on Supercomputing, pp. 97–106, 2002.

  27. Y. Ben-Asher, D. Meisler, and N. Rotem. Reducing Memory Constraints in Modulo Scheduling Synthesis for FPGAs. ACM Trans. on Reconfigurable Technology and Systems, 3(3), 2010.

  28. Rau, B. R. (1994). Iterative modulo scheduling: An algorithm for software pipelining loops. Int’l Symposium on Microarchitecture, 63–74.

  29. Sun, W., Wirthlin, M., & Neuendorffer, S. (2007). FPGA pipeline synthesis design exploration using module selection and resource sharing. Transaction Comparative-Aided Design Integrative Cirurgica System, 26(2), 254–265.

    Article  Google Scholar 

  30. J. Llosa, E. Ayguad’e, A. Gonzalez, M. Valero, and J. Eckhardt. “Lifetime-Sensitive Modulo Scheduling in a Production Environment,” IEEE Trans. on Computers, 50(3), 2001.

  31. R. A. Huff. “Lifetime-sensitive modulo scheduling,” ACM SIGPLAN Conf. on Programming Languages Design and Implementation, pp. 258–267, June 1993.

  32. C. Hewitt. “Viewing control structures as patterns of passing messages. Journal of Artificial Intelligence,” 8(3):323{363, June 1977.

  33. M. Wipliez, G. Roquier, and J. Nezan, “Software Code Generation for the RVC-CAL Language,” Journal of Signal Processing Systems, 63(2),2011, pp. 1–9.

  34. E. Bezati, S. Casale-Brunet, M. Mattavelli, and J. Janneck, “Synthesis and optimization of high-level stream programs,” in The 2013 Electronic System Level Synthesis Conference, 2013, pp. 1–6.

  35. Mattavelli, M., Casale-Brunet, S., Elguindy, A., Bezati, E., Thavot, R., Roquier, G., & Janneck, J. (2013). Methods to explore design space for MPEG RVC codec specifications. Signal processing Image Communication: Elsevier.

    Google Scholar 

  36. Bezati, E., Casale-Brunet, S., Mattavelli, M., & Janneck, J. W. (2017). Clock-gating of streaming applications for energy efficient implementations on FPGAs. In: IEEE Transaction on CAD of Integrated Circuits and Systems, 36(4), 699–703.

    Google Scholar 

  37. Palumbo, F., Sau, C., Fanni, T., Meloni, P., Raffo, L.: Dataflow-based design of coarse-grained reconfigurable platforms. In: 2016 IEEE International Workshop on Signal Processing Systems, SiPS 2016.

  38. Sau, C., Fanni, L., Meloni, P., Raffo, L., Palumbo, F.: Reconfigurable coprocessors synthesis in the MPEG-RVC domain. In:2015 International Conference on ReConFigurable Computing and FPGAs, ReConFig 2015.

  39. Ren, R., Juarez, E., Sanz, C., Raulet, M., & Pescador, F. (2014). Energy-aware decoder management: A case study on RVC-CAL specification based on just-in-time adaptive decoder engine. IEEE Transaction Consumer Electronics, 60(3), 499–507.

    Article  Google Scholar 

  40. Palumbo, F., Sau, C., Raffo, L.: DSE and profiling of multi-context coarse-grained reconfigurable systems. In: 8th International Symposium on Image and Signal Processing and Analysis, ISPA 2013.

  41. Gorin, J., Yviquel, H., Preteux, F.J., Raulet, M.: Just-in-time adaptive decoder engine: A universal video decoder based on MPEG RVC. In: Conference on Multimedia, 2011.

  42. Beaumin, C., Sentieys, O., Casseau, E., Carer, A.: A coarse-grain reconfigurable hardware architecture for rvc-cal-based design. In: Design and Architectures for Signal and Image Processing, DASIP 2010.

  43. Amer, I., Lucarz, C., Mattavelli, M., Raulet, M., Nezan, J., Deforges, O.: Reconfigurable video coding on multicore: An overview of its main objectives. In: IEEE Signal Processing Magazine, 26(6), 113–123, 2009.

  44. Roquier, G., Wipliez, M., Raulet, M., Janneck, J.W., Miller, I.D., Parlour, D.B.: Automatic software synthesis of dataflow program: An MPEG-4 simple profile decoder case study. In: IEEE Workshop on Signal Processing Systems, SiPS 2008.

  45. M. Canale, S. Casale-Brunet, E. Bezati, M. Mattavelli, J. Janneck: "Dataflow Programs Analysis and Optimization Using Model Predictive Control Techniques", Journal of Signal Processing Systems, 2016, Vol: 84, No. 3, Pages 371—381.

Download references

Acronyms

1. CAL Concurrent algorithmic language.

2. ASAP As soon as possible pipeline scheduling algorithm.

3. ALAP As late as possible pipeline scheduling algorithm.

4. P A set of operators (statements, instructions).

5. V A set of variables.

6. inputs(p) A set of input variables of operator p.

7. outputs(p) A set of output variables of operator p

8. size(v) A bit-size of variable v

9. prod(v) Operators-producers of variable v

10. cons(v) Operators-consumers of variable v.

11. T A set of conditional Boolean variables.

12. Z A set of primary Boolean variables.

13. H A set of Boolean functions that evaluate the conditional variables over the primary variables.

14. F A set of feasible Boolean functions for values of pairs of primary variables.

15. λ A Boolean function that characterizes the feasible set of vector values of primary variables.

16. μ A Boolean function that takes value 1 when two conditional variables are orthogonal.

17. Rdirect Operators direct precedence relation.

18. R Transitive closure of Rdirect.

19. succ(p) Operators-successors of operator p.

20. pred(p) Operators-predecessors of operator p.

21. G A matrix of all operator pairs longest paths lengths.

22. Fbr(s) A subset of operators in a feedback region of variable s.

23. S A set of pipeline stages.

24. stage(p) An assignment of operator p to a stage.

25. Tstage A constraint on the pipeline-stage time-period.

26. C An operator conflict relation (graph).

27. Cn An operator nonconflict relation (graph).

28. cdpred(p) A set of direct predecessors of operator p on graph C.

29. ncdpred(p) A set of direct predecessors of operator p on graph Cn

30. cdsucc (p) A set of direct successors of operator p on graph C

31. ncdsucc (p) A set of direct successors of operator p on graph Cn

32. asap A pipeline schedule that ASAP algorithm generates on conflict graph C

33. alap A pipeline schedule that ALAP algorithm generates on conflict graph C.

34. Rsize(stage) An overall pipeline registers size of the schedule described by the operators vector assignment stage

35. lifetime(v) Lifetime of variable v over pipeline stages.

36. FASAP Extension of ASAP for branched feedback programs.

37. FALAP Extension of ALAP for branched feedback programs.

38. fasap A pipeline schedule that FASAP algorithm generates on conflict graph C.

39. falap A pipeline schedule that FALAP algorithm generates on conflict graph C.

40. LCSBB Least cost search branch and bound algorithm of pipeline optimization.

41. FLCSBB Extension of LCSBB for branched feedback dataflow programs.

42. flcsbb A pipeline schedule that FLCSBB algorithm generates on conflict graph C.

43. FHADD Feedback dataflow optimization Heuristic Algorithm using dynamic heuristics for operators and stages.

44. FHASD Feedback dataflow optimization Heuristic Algorithm using static heuristics for operators and dynamic heuristics for stages.

45. early(p) A lower bound of a range of available stages for operator p.

46. late(p) A upper bound of a range of available stages for operator p

47. lifestim(v) A lower bound of lifetime of variable v over pipeline stages

48. early(v) An upper bound of the earliest stage of producers of variable v

49. late(v) A lower bound of the latest stage of consumers of variable v

50. pos(x) A function whose value equals 0 if x ≤ 0, and equals x otherwise

51. χ(p) A heuristic weight of operator p whose maximal value indicates that p is preferable for scheduling

52. ω A vector of heuristic factors

53. ρ(p) A vector of heuristic parameters of operator p.

54. GA Genetic algorithm for tuning heuristics.

55. F(ω) A fitness function of individual ω that represents quality of the corresponding pipeline solution.

56. FPS A fitness proportionate selection operation.

57. WPS A worst parent selection operation.

58. WIS A worst individual selection in the current generation.

59. HUX A half uniform crossover operation.

60. SOX A single offspring crossover operation.

61. α, β Heuristic factor weights.

62. ε A mutation factor

63. pcross Probability of choosing the crossover

64. pmut Probability of choosing the mutation

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Anatoly Prihozhy.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Prihozhy, A., Casale-Brunet, S., Bezati, E. et al. Pipeline Synthesis and Optimization from Branched Feedback Dataflow Programs. J Sign Process Syst 92, 1091–1099 (2020). https://doi.org/10.1007/s11265-020-01568-5

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11265-020-01568-5

Keywords

Navigation