Pipeline Synthesis and Optimization from Branched Feedback Dataflow Programs

Prihozhy, Anatoly; Casale-Brunet, Simone; Bezati, Endri; Mattavelli, Marco

doi:10.1007/s11265-020-01568-5

Pipeline Synthesis and Optimization from Branched Feedback Dataflow Programs

Published: 11 July 2020

Volume 92, pages 1091–1099, (2020)
Cite this article

Journal of Signal Processing Systems Aims and scope Submit manuscript

Anatoly Prihozhy ORCID: orcid.org/0000-0002-1941-0806¹,
Simone Casale-Brunet²,
Endri Bezati³ &
…
Marco Mattavelli²

235 Accesses
4 Citations
Explore all metrics

Abstract

Large dataflow designs are a result of behavioral specification of modern complex digital systems and/or a result of unfolding and transforming looped and branched programs. Since deep-submicron silicon technology provides large amounts of available resources, pipelining optimization without (or with minimal) resource sharing can give significant advantages in performance. High-level synthesis of CAL-programs is particularly popular in computation intensive applications (e.g., image and video processing, cryptography, wireless communication, etc.) where feedback actors with data flows at input and output ports represent loop-like behavior. In this work, we propose techniques for transforming, analysis, speculatively pipelining and optimizing large branched feedback dataflow programs. We develop an accurate algorithm and introduce fast dynamic and mixed static / dynamic heuristics that firstly minimize the number of pipeline stages for a given pipeline-stage time-period, and secondly minimize the overall pipeline registers size by means of appropriate assignment of feedbacks and instructions to pipeline stages. We also propose a genetic algorithm for tuning the heuristics for a particular design. The experimental results show the algorithms we propose give quickly solutions that are very close to accurate solutions and overcomes the earlier developed algorithms regarding computing time and pipeline parameters.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Automatic Flat-Level Circuit Generation with Genetic Algorithms

Dataflow Programs Analysis and Optimization Using Model Predictive Control Techniques

Article 10 November 2015

Programmable Feedback Shift Register

Article 11 March 2023

References

Park, N., & Parker, A. C. (March 1988). Sehwa: A software package for synthesis of pipelines from behavioral specifications. IEEE Transactions on CAD of ICs, 7, 356–370.
Article Google Scholar
K. S. Hwang, A. E. Casavant, C.-T. Chang, and M. A. d’Abreu, “Scheduling and hardware sharing in pipelined data paths,” in Proc. ICCAD-89, November 1989, pp. 24–27.
E. M. Girczyc, “Loop winding - a data flow approach to functional pipelining,” in Proc. of the IEEE ISCAS, May 1987, pp. 382–385.
Hwang, C.-T., Hsu, Y.-C., & Lin, Y.-L. (September 1993). Pls: A scheduler for pipeline synthesis. Transactions Comparative-Aided Design Integrative Cirugia Systems, 12(9), 1279–1286.
Article Google Scholar
Jun, H.-S., & Hwang, S.-Y. (Sep 1994). Design of a pipelined datapath synthesis system for digital signal processing. IEEE Transactions VLSI Syst., 2(3), 292–303.
Article Google Scholar
Bakshi, S., & Gajski, D. D. (June 1996). Component selection for high-performance pipelines. IEEE Transactions VLSI System, 4(2), 181–194.
Article Google Scholar
A.-H. Ab Rahman, A. Prihozhy, and M. Mattavelli, “Pipeline synthesis and optimization of FPGA-based video processing applications with CAL,” EURASIP Journal on Image and Video Processing, vol. 2011:19, pp. 1–28, 2011. https://doi.org/10.1186/1687-5281-2011-19.
Prihozhy, A., Bezati, E., Ab Rahman, A.-H., & Mattavelli, M. (2015). Synthesis and optimization of pipelines for HW implementations of dataflow programs. IEEE Transactions on CAD, 34(10), 1613–1626.
Article Google Scholar
A. Prihozhy, S. Casale-Brunet, E. Bezati and M. Mattavelli. “Efficient Dynamic Optimization Heuristics for Dataflow Pipelines,” IEEE International Workshop on Signal Processing Systems, IEEE, pp. 337–342, October 2018.
J. Eker and J. Janneck, CAL language report: Specification of the CAL actor language. University of California-Berkeley, December 2003.
Mattavelli, M., Amer, I., & Raulet, M. (2010). “The reconfigurable video coding standard” [standards in a nutshell]. Signal Processing Magazine, IEEE, 27(3), 159–167.
Article Google Scholar
Z. Zhang, B. Liu. “SDC-Based Modulo Scheduling for Pipeline Synthesis,” IEEE/ACM International Conference on Computer-Aided Design (ICCAD), pp. 211–218, November 2013.
Weinhardt, M., & Luk, W. (Feb. 2001). Pipeline vectorization. Transactions Comparative-Aided Design Integrative Cirurgica System, 20(2), 234–248.
Article Google Scholar
G. Demicheli, “Hardware synthesis from C/C++ models,” in Design, Automation and Test in Europe Conference and Exhibition 1999, pp. 382–383.
Prihozhy, A. (2001). High-level synthesis through transforming VHDL models (pp. 135–146). System-on-Chip Methodologies & Design Languages, Kluwer Academic Publishers, Springer: Boston, MA.
Google Scholar
L.-F. Chao, A. LaPaugh, and E.-M. Sha, “Rotation scheduling: A loop pipelining algorithm,” Transactions Comparative-Aided Design Integrative Cirurgica System, vol. 16, no. 3, pp. 229–239, Mar 1997.
Verhaegh, W. F. J., Lippens, P. E. R., Aarts, E. H. L., Korst, J. H. M., Van Meerbergen, J., & van der Werf, A. (1995). Improved force-directed scheduling in high-throughput digital signal processing. Transactions Comparative-Aided Design Integrative Cirurgica System, 14(8), 945–960.
Article Google Scholar
E. Nurvitadhi, J. Hoe, T. Kam, and S. Lu, “Automatic pipelining from transactional datapath specifications,” Transactions Comparative-Aided Design Integrative Cirurgica System, vol. 30, no. 3, pp. 441–454, 2011.
Oh, S., Kim, T. G., Cho, J., & Bozorgzadeh, E. (2008). Speculative loop pipelining in binary translation for hardware acceleration. Transactions Comparative-Aided Design Integrative Cirurgica System, 27(3), 409–422.
Article Google Scholar
Serot, J., Berry, F., & Ahmed, S. (2011). Implementing Stream-Processing Applications on FPGAs: A DSL-Based Approach. 21st International Conference on Field Programmable Logic and Applications, Chania, 2011, 130–137.
Google Scholar
Paulin, P. G., & Knight, J. P. (1989). Force-directed scheduling for the behavioral synthesis of ASIC’s. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 8(6), 661–679.
Article Google Scholar
Shenoy, N. (1997). Retiming: Theory and practice. VLSI J Integrative, 22(1–2), 1–21.
Article Google Scholar
R. Potasman, J. Lis, A. Aiken, and A. Nicolau, “Percolation based synthesis,” in Proc. 27th Design Automation Conf., 1990, pp. 444–449.
Javaid, H., Ignjatovic, A., & Parameswaran, S. (2010). Rapid design space exploration of application specific heterogeneous pipelined multiprocessor systems. Transactions Comparative-Aided Integrative, 29(11), 1777–1789.
Google Scholar
Rau, B. R., & Glaeser, C. D. (1981). Some scheduling techniques and an easily schedulable horizontal architecture for high performance scientific computing. ACM SIGMICRO Newsletter, 12(4), 183–198.
Article Google Scholar
J. Codina, J. Llosa, and A. Gonz’alez. “A Comparative Study of Modulo Scheduling Techniques,” Int’l Conf. on Supercomputing, pp. 97–106, 2002.
Y. Ben-Asher, D. Meisler, and N. Rotem. Reducing Memory Constraints in Modulo Scheduling Synthesis for FPGAs. ACM Trans. on Reconfigurable Technology and Systems, 3(3), 2010.
Rau, B. R. (1994). Iterative modulo scheduling: An algorithm for software pipelining loops. Int’l Symposium on Microarchitecture, 63–74.
Sun, W., Wirthlin, M., & Neuendorffer, S. (2007). FPGA pipeline synthesis design exploration using module selection and resource sharing. Transaction Comparative-Aided Design Integrative Cirurgica System, 26(2), 254–265.
Article Google Scholar
J. Llosa, E. Ayguad’e, A. Gonzalez, M. Valero, and J. Eckhardt. “Lifetime-Sensitive Modulo Scheduling in a Production Environment,” IEEE Trans. on Computers, 50(3), 2001.
R. A. Huff. “Lifetime-sensitive modulo scheduling,” ACM SIGPLAN Conf. on Programming Languages Design and Implementation, pp. 258–267, June 1993.
C. Hewitt. “Viewing control structures as patterns of passing messages. Journal of Artificial Intelligence,” 8(3):323{363, June 1977.
M. Wipliez, G. Roquier, and J. Nezan, “Software Code Generation for the RVC-CAL Language,” Journal of Signal Processing Systems, 63(2),2011, pp. 1–9.
E. Bezati, S. Casale-Brunet, M. Mattavelli, and J. Janneck, “Synthesis and optimization of high-level stream programs,” in The 2013 Electronic System Level Synthesis Conference, 2013, pp. 1–6.
Mattavelli, M., Casale-Brunet, S., Elguindy, A., Bezati, E., Thavot, R., Roquier, G., & Janneck, J. (2013). Methods to explore design space for MPEG RVC codec specifications. Signal processing Image Communication: Elsevier.
Google Scholar
Bezati, E., Casale-Brunet, S., Mattavelli, M., & Janneck, J. W. (2017). Clock-gating of streaming applications for energy efficient implementations on FPGAs. In: IEEE Transaction on CAD of Integrated Circuits and Systems, 36(4), 699–703.
Google Scholar
Palumbo, F., Sau, C., Fanni, T., Meloni, P., Raffo, L.: Dataflow-based design of coarse-grained reconfigurable platforms. In: 2016 IEEE International Workshop on Signal Processing Systems, SiPS 2016.
Sau, C., Fanni, L., Meloni, P., Raffo, L., Palumbo, F.: Reconfigurable coprocessors synthesis in the MPEG-RVC domain. In:2015 International Conference on ReConFigurable Computing and FPGAs, ReConFig 2015.
Ren, R., Juarez, E., Sanz, C., Raulet, M., & Pescador, F. (2014). Energy-aware decoder management: A case study on RVC-CAL specification based on just-in-time adaptive decoder engine. IEEE Transaction Consumer Electronics, 60(3), 499–507.
Article Google Scholar
Palumbo, F., Sau, C., Raffo, L.: DSE and profiling of multi-context coarse-grained reconfigurable systems. In: 8th International Symposium on Image and Signal Processing and Analysis, ISPA 2013.
Gorin, J., Yviquel, H., Preteux, F.J., Raulet, M.: Just-in-time adaptive decoder engine: A universal video decoder based on MPEG RVC. In: Conference on Multimedia, 2011.
Beaumin, C., Sentieys, O., Casseau, E., Carer, A.: A coarse-grain reconfigurable hardware architecture for rvc-cal-based design. In: Design and Architectures for Signal and Image Processing, DASIP 2010.
Amer, I., Lucarz, C., Mattavelli, M., Raulet, M., Nezan, J., Deforges, O.: Reconfigurable video coding on multicore: An overview of its main objectives. In: IEEE Signal Processing Magazine, 26(6), 113–123, 2009.
Roquier, G., Wipliez, M., Raulet, M., Janneck, J.W., Miller, I.D., Parlour, D.B.: Automatic software synthesis of dataflow program: An MPEG-4 simple profile decoder case study. In: IEEE Workshop on Signal Processing Systems, SiPS 2008.
M. Canale, S. Casale-Brunet, E. Bezati, M. Mattavelli, J. Janneck: "Dataflow Programs Analysis and Optimization Using Model Predictive Control Techniques", Journal of Signal Processing Systems, 2016, Vol: 84, No. 3, Pages 371—381.

Download references

Acronyms

1. CAL Concurrent algorithmic language.

2. ASAP As soon as possible pipeline scheduling algorithm.

3. ALAP As late as possible pipeline scheduling algorithm.

4. P A set of operators (statements, instructions).

5. V A set of variables.

6. inputs(p) A set of input variables of operator p.

7. outputs(p) A set of output variables of operator p

8. size(v) A bit-size of variable v

9. prod(v) Operators-producers of variable v

10. cons(v) Operators-consumers of variable v.

11. T A set of conditional Boolean variables.

12. Z A set of primary Boolean variables.

13. H A set of Boolean functions that evaluate the conditional variables over the primary variables.

14. F A set of feasible Boolean functions for values of pairs of primary variables.

15. λ A Boolean function that characterizes the feasible set of vector values of primary variables.

16. μ A Boolean function that takes value 1 when two conditional variables are orthogonal.

17. R_direct Operators direct precedence relation.

18. R Transitive closure of R_direct.

19. succ(p) Operators-successors of operator p.

20. pred(p) Operators-predecessors of operator p.

21. G A matrix of all operator pairs longest paths lengths.

22. Fbr(s) A subset of operators in a feedback region of variable s.

23. S A set of pipeline stages.

24. stage(p) An assignment of operator p to a stage.

25. T_stage A constraint on the pipeline-stage time-period.

26. C An operator conflict relation (graph).

27. C_n An operator nonconflict relation (graph).

28. cdpred(p) A set of direct predecessors of operator p on graph C.

29. ncdpred(p) A set of direct predecessors of operator p on graph C_n

30. cdsucc (p) A set of direct successors of operator p on graph C

31. ncdsucc (p) A set of direct successors of operator p on graph C_n

32. asap A pipeline schedule that ASAP algorithm generates on conflict graph C

33. alap A pipeline schedule that ALAP algorithm generates on conflict graph C.

34. Rsize(stage) An overall pipeline registers size of the schedule described by the operators vector assignment stage

35. lifetime(v) Lifetime of variable v over pipeline stages.

36. FASAP Extension of ASAP for branched feedback programs.

37. FALAP Extension of ALAP for branched feedback programs.

38. fasap A pipeline schedule that FASAP algorithm generates on conflict graph C.

39. falap A pipeline schedule that FALAP algorithm generates on conflict graph C.

40. LCSBB Least cost search branch and bound algorithm of pipeline optimization.

41. FLCSBB Extension of LCSBB for branched feedback dataflow programs.

42. flcsbb A pipeline schedule that FLCSBB algorithm generates on conflict graph C.

43. FHADD Feedback dataflow optimization Heuristic Algorithm using dynamic heuristics for operators and stages.

44. FHASD Feedback dataflow optimization Heuristic Algorithm using static heuristics for operators and dynamic heuristics for stages.

45. early(p) A lower bound of a range of available stages for operator p.

46. late(p) A upper bound of a range of available stages for operator p

47. lifestim(v) A lower bound of lifetime of variable v over pipeline stages

48. early(v) An upper bound of the earliest stage of producers of variable v

49. late(v) A lower bound of the latest stage of consumers of variable v

50. pos(x) A function whose value equals 0 if x ≤ 0, and equals x otherwise

51. χ(p) A heuristic weight of operator p whose maximal value indicates that p is preferable for scheduling

52. ω A vector of heuristic factors

53. ρ(p) A vector of heuristic parameters of operator p.

54. GA Genetic algorithm for tuning heuristics.

55. F(ω) A fitness function of individual ω that represents quality of the corresponding pipeline solution.

56. FPS A fitness proportionate selection operation.

57. WPS A worst parent selection operation.

58. WIS A worst individual selection in the current generation.

59. HUX A half uniform crossover operation.

60. SOX A single offspring crossover operation.

61. α, β Heuristic factor weights.

62. ε A mutation factor

63. p_cross Probability of choosing the crossover

64. p_mut Probability of choosing the mutation

Author information

Authors and Affiliations

Computer and System Software Dpt, Belarusian National Technical University, Minsk, Belarus
Anatoly Prihozhy
EPFL SCI STI MM, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
Simone Casale-Brunet & Marco Mattavelli
EPFL IC IINFCOM VLSC, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
Endri Bezati

Authors

Anatoly Prihozhy
View author publications
You can also search for this author in PubMed Google Scholar
Simone Casale-Brunet
View author publications
You can also search for this author in PubMed Google Scholar
Endri Bezati
View author publications
You can also search for this author in PubMed Google Scholar
Marco Mattavelli
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Anatoly Prihozhy.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Prihozhy, A., Casale-Brunet, S., Bezati, E. et al. Pipeline Synthesis and Optimization from Branched Feedback Dataflow Programs. J Sign Process Syst 92, 1091–1099 (2020). https://doi.org/10.1007/s11265-020-01568-5

Download citation

Received: 18 April 2019
Revised: 03 June 2020
Accepted: 11 June 2020
Published: 11 July 2020
Issue Date: October 2020
DOI: https://doi.org/10.1007/s11265-020-01568-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Pipeline Synthesis and Optimization from Branched Feedback Dataflow Programs

Abstract

Access this article

Similar content being viewed by others

Automatic Flat-Level Circuit Generation with Genetic Algorithms

Dataflow Programs Analysis and Optimization Using Model Predictive Control Techniques

Programmable Feedback Shift Register

References

Acronyms

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Pipeline Synthesis and Optimization from Branched Feedback Dataflow Programs

Abstract

Access this article

Similar content being viewed by others

Automatic Flat-Level Circuit Generation with Genetic Algorithms

Dataflow Programs Analysis and Optimization Using Model Predictive Control Techniques

Programmable Feedback Shift Register

References

Acronyms

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation