Journal of Signal Processing Systems

, Volume 87, Issue 1, pp 3–20 | Cite as

Profile Guided Dataflow Transformation for FPGAs and CPUs

  • Robert Stewart
  • Deepayan Bhowmik
  • Andrew Wallace
  • Greg Michaelson


This paper proposes a new high-level approach for optimising field programmable gate array (FPGA) designs. FPGA designs are commonly implemented in low-level hardware description languages (HDLs), which lack the abstractions necessary for identifying opportunities for significant performance improvements. Using a computer vision case study, we show that modelling computation with dataflow abstractions enables substantial restructuring of FPGA designs before lowering to the HDL level, and also improve CPU performance. Using the CPU transformations, runtime is reduced by 43 %. Using the FPGA transformations, clock frequency is increased from 67MHz to 110MHz. Our results outperform commercial low-level HDL optimisations, showcasing dataflow program abstraction as an amenable computation model for highly effective FPGA optimisation.


Dataflow Profiling Transformations FPGA CPU 



We acknowledge the support of the Engineering and Physical Research Council, grant references EP/K009931/1 (Programmable embedded platforms for remote and compute intensive image processing applications). The authors thank Blair Archibald for helpful feedback.


  1. 1.
    Adl-Tabatabai, A., Cierniak, M., Lueh, G., Parikh, V.M., & Stichnoth, J.M. (1998). Fast, effective code generation in a just-in-time java compiler. In Proceedings of the ACM SIGPLAN ’98 Conference on programming language design and implementation (PLDI), Montreal, Canada, June 17-19, 1998, pp. 280–290. ACM.Google Scholar
  2. 2.
    Bacon, D.F., Graham, S.L., & Sharp, O.J. (1994). Compiler transformations for high-performance computing. ACM Computing Surveys, 26(4), 345–420.CrossRefGoogle Scholar
  3. 3.
    Bezati, E., Mattavelli, M., & Janneck, J.W. (2013). High-level synthesis of dataflow programs for signal processing systems. In International symposium on image and signal processing and analysis (ISPA), Trieste, Italy September 4-6, pp. 750–754. IEEE.Google Scholar
  4. 4.
    Bhowmik, D., Wallace, A.M., Stewart, R., Qian, X., & Michaelson, G.J. (2014). Profile driven dataflow optimisation of mean shift visual tracking. In IEEE Global conference on signal and information processing, GlobalSIP 2014, Atlanta, GA, USA, December 3-5, pp. 1–5.Google Scholar
  5. 5.
    Bonenfant, A., Chen, Z., Hammond, K., Michaelson, G., Wallace, A., & Wallace, I. (2007). Towards Resource-certified software: A formal Cost Model for Time and Its Application to an Image-Processing Example. In Proceedings ACM symposium on applied computing, pp. 1307–1314.Google Scholar
  6. 6.
    Brown, C., Danelutto, M., Hammond, K., Kilpatrick, P., & Elliott, A. (2014). Cost-directed refactoring for parallel erlang programs. International Journal of Parallel Programming, 42(4), 564– 582.CrossRefGoogle Scholar
  7. 7.
    Brown, C., Loidl, H., & Hammond, K. (2011). ParaForming: Forming parallel haskell programs using novel refactoring techniques. In Peña, R., & Page, R.L. (Eds.) Trends in functional programming, 12th international symposium, TFP 2011, Madrid, Spain, May 16-18, 2011, revised selected papers, lecture notes in computer science, vol. 7193, pp. 82–97. Springer.Google Scholar
  8. 8.
    Brunet, S.C., Alberti, C., Mattavelli, M., & Janneck, J.W. (2013). Turnus: A unified dataflow design space exploration framework for heterogeneous parallel systems. In Conference on design and architectures for signal and image processing, Cagliari, Italy, October 8-10, 2013, pp. 47–54. IEEE.Google Scholar
  9. 9.
    Chang, P.P., Mahlke, S.A., Chen, W.Y., mei, W., & Hwu, W. (1992). Profile-guided automatic inline expansion for C programs. Software, Practice Experience, 22(5), 349–369.CrossRefGoogle Scholar
  10. 10.
    Comaniciu, D., Ramesh, V., & Meer, P. (2003). Kernel-based object tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(5), 564–577.CrossRefGoogle Scholar
  11. 11.
    Dagum, L., & Menon, R. (1998). OpenMP: An industry-standard api for shared-memory programming. IEEE Computational Science and Engineering, 5(1), 46–55.CrossRefGoogle Scholar
  12. 12.
    Eker, J., & Janneck, J.W. (2003). CAL language report specification of the CAL actor language. Tech. Rep. UCB/ERL M03/48, EECS Department. Berkeley: University of California. Scholar
  13. 13.
    Floating-point working group, IEEE computer society: IEEE standard for binary floating-point arithmetic (1985). Note: Standard 754–1985.Google Scholar
  14. 14.
    Gordon, M.I., Thies, W., Karczmarek, M., Lin, J., Meli, A.S., Lamb, A.A., Leger, C., Wong, J., Hoffmann, H., Maze, D., & Amarasinghe, S.P. (2002). A stream compiler for communication-exposed architectures. In Proceedings of the 10th international conference on architectural support for programming languages and operating systems (ASPLOS-X), San Jose, California, USA, October 5-9, 2002., pp. 291–303.Google Scholar
  15. 15.
    Govindu, G., Zhuo, L., Choi, S., & Prasanna, V.K. (2004). Analysis of High-Performance Floating-Point Arithmetic on FPGAs. In 18th International parallel and distributed processing symposium (IPDPS 2004), CD-ROM / abstracts proceedings, 26-30 April, Santa Fe, New Mexico, USA. IEEE Computer Society.Google Scholar
  16. 16.
    Grov, G., & Michaelson, G. (2010). Hume box calculus: Robust system development through software transformation. Higher-Order and Symbolic Computation, 23(2), 191–226.MathSciNetCrossRefMATHGoogle Scholar
  17. 17.
    Intel: Intel VTune performance analyzer.
  18. 18.
    Janneck, J.W., Mattavelli, M., Raulet, M., & Wipliez, M. (2010). Reconfigurable video coding: A stream programming approach to the specification of new video coding standards. In Feng, W., & Mayer-Patel, K. (Eds.) Proceedings of the first annual ACM SIGMM conference on multimedia systems, MMSys 2010, Phoenix, Arizona, USA, February 22-23, 2010, pp. 223–234. ACM.Google Scholar
  19. 19.
    Janneck, J.W., Miller, I.D., Parlour, D.B., Roquier, G., Wipliez, M., & Raulet, M. (2011). Synthesizing hardware from dataflow programs - An MPEG-4 simple profile decoder case study. Signal Processing Systems, 63 (2), 241–249.CrossRefGoogle Scholar
  20. 20.
    Kuck, D.J. (1977). A survey of parallel machine organization and programming. ACM Computing Surveys, 9 (1), 29–59.MathSciNetCrossRefMATHGoogle Scholar
  21. 21.
    Marathe, J., & Mueller, F. (2006). Hardware Profile-guided automatic page placement for ccnuma systems. In J.Torrellas, & S.Chatterjee (Eds.) Proceedings of the ACM SIGPLAN symposium on principles and practice of parallel programming, PPOPP 2006, New York, New York, USA, March 29-31, pp. 90–99. ACM.Google Scholar
  22. 22.
    of Reading, U.: Performance evaluation of tracking and surveillance (PETS 2009) dataset (2009).
  23. 23.
    Scholz, S. (2003). Single Assignment C: Efficient support for high-level array operations in a functional setting. Journal of Functional Programming, 1(6), 1005—1059.MathSciNetMATHGoogle Scholar
  24. 24.
    Stewart, R., Bhowmik, D., Michaelson, G., & Wallace, A. (2015). Open access dataset for profile guided dataflow transformation for FPGAs and CPUs. doi: 10.17861/7925c541-42d9-4ded-9a01-5ac652d51353.
  25. 25.
    Trinder, P.W., Hammond, K., Loidl, H.W., & Peyton Jones, S.L. (1998). Algorithm + Strategy = Parallelism. Journal of Functional Programming, 8(1), 23–60.MathSciNetCrossRefMATHGoogle Scholar
  26. 26.
    Underwood, K.D. (2004). FPGAs vs. CPUs: Trends in peak floating-point performance. In R. Tessier, & H. Schmit (Eds.) Proceedings of the ACM/SIGDA 12th international symposium on field programmable gate arrays, FPGA 2004, Monterey, California, USA, February 22–24, 2004, pp. 171–180. ACM.Google Scholar
  27. 27.
  28. 28.
    Yviquel, H., Lorence, A., Jerbi, K., Cocherel, G., Sanchez, A., & Raulet, M. (2013). Orcc: Multimedia development made easy. In ACM multimedia conference, MM ’13, Barcelona, Spain, October 21–25, 2013, pp. 863–866. ACM.Google Scholar

Copyright information

© Springer Science+Business Media New York 2015

Authors and Affiliations

  • Robert Stewart
    • 1
  • Deepayan Bhowmik
    • 2
  • Andrew Wallace
    • 2
  • Greg Michaelson
    • 1
  1. 1.School of Mathematical and Computer SciencesHeriot-Watt UniversityEdinburghUK
  2. 2.School of Engineering and Physical SciencesHeriot-Watt UniversityEdinburghUK

Personalised recommendations