Advertisement

A Dataflow IR for Memory Efficient RIPL Compilation to FPGAs

  • Robert Stewart
  • Greg Michaelson
  • Deepayan Bhowmik
  • Paulo Garcia
  • Andy Wallace
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10049)

Abstract

Field programmable gate arrays (FPGAs) are fundamentally different to fixed processors architectures because their memory hierarchies can be tailored to the needs of an algorithm. FPGA compilers for high level languages are not hindered by fixed memory hierarchies. The constraint when compiling to FPGAs is the availability of resources.

In this paper we describe how the dataflow intermediary of our declarative FPGA image processing DSL called RIPL (Rathlin Image Processing Language) enables us to constrain memory. We use five benchmarks to demonstrate that memory use with RIPL is comparable to the Vivado HLS OpenCV library without the need for language pragmas to guide hardware synthesis. The benchmarks also show that RIPL is more expressive than the Darkroom FPGA image processing language.

Keywords

Domain specific languages FPGAs Data locality 

Notes

Acknowledgements

We acknowledge the support of the Engineering and Physical Research Council, grant reference EP/K009931/1 (Programmable embedded platforms for remote and compute intensive image processing applications).

References

  1. 1.
    Bezati, E.: High-level synthesis of dataflow programs for heterogeneous platforms. Ph.D. thesis, STI, EPFL, Switzerland (2015)Google Scholar
  2. 2.
    Bradski, G.R., Kaehler, A.: Learning OpenCV - Computer Vision with the OpenCV library: Software that Sees. O’Reilly, Beijing (2008)Google Scholar
  3. 3.
    Cole, M.: Algorithmic Skeletons: Structured Management of Parallel Computation. MIT Press, Cambridge (1991)zbMATHGoogle Scholar
  4. 4.
    DeVito, Z., Hegarty, J., Aiken, A., Hanrahan, P., Vitek, J.: Terra: a multi-stage language for high-performance computing. In: ACM SIGPLAN Conference on Programming Language Design and Implementation, Seattle, WA, USA, June 16–19, 2013, pp. 105–116. ACM (2013)Google Scholar
  5. 5.
    Hegarty, J., Brunhaver, J., DeVito, Z., Ragan-Kelley, J., Cohen, N., Bell, S., Vasilyev, A., Horowitz, M., Hanrahan, P.: Darkroom: compiling high-level image processing code into hardware pipelines. ACM Trans. Graph. 33(4), 1–11 (2014)CrossRefGoogle Scholar
  6. 6.
    Kennedy, K., McKinley, K.S.: Maximizing loop parallelism and improving data locality via loop fusion and distribution. In: Banerjee, U., Gelernter, D., Nicolau, A., Padua, D. (eds.) LCPC 1993. LNCS, vol. 768, pp. 301–320. Springer, Heidelberg (1994). doi: 10.1007/3-540-57659-2_18 CrossRefGoogle Scholar
  7. 7.
    Kiselyov, O.: Iteratee IO: Safe, Practical, Declarative Input Processing. In: 11th International Symposium on Functional and Logic Programming. LNCS, vol. 7294, pp. 166–181 (2012)Google Scholar
  8. 8.
    Lee, H., Brown, K.J., Sujeeth, A.K., Rompf, T., Olukotun, K.: locality-aware mapping of nested parallel patterns on GPUs. In: 47th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2014, Cambridge, UK, December 13–17, 2014, pp. 63–74. IEEE (2014)Google Scholar
  9. 9.
    Muddukrishna, A., Jonsson, P.A., Brorsson, M.: Locality-aware task scheduling and data distribution for openmp programs on NUMA systems and manycore processors. Sci. Program. 2015, 981759: 1–981759: 16 (2015)Google Scholar
  10. 10.
    Stephen Neuendorffer, T.L., Wang, D.: Accelerating OpenCV Applications with Zynq-7000 All Programmable SoC using Vivado HLS Video Libraries. Technical report, Xilinx, June 2015Google Scholar
  11. 11.
    Tate, A., et al.: Programming abstractions for data locality. In: Workshop on Programming Abstractions for Data Locality, Swiss National Supercomputing Center, Lugano, Switzerland, April 2014Google Scholar
  12. 12.
    Wieser, V., Grelck, C., Haslinger, P., Guo, J., Korzeniowski, F., Bernecky, R., Moser, B., Scholz, S.: Combining high productivity and high performance in image processing using Single Assignment C on multi-core CPUs and many-core GPUs. J. Electron. Imaging 21(2), 21116 (2012)CrossRefGoogle Scholar
  13. 13.
    Xilinx: Implementing Memory Structures for Video Processing in the Vivado HLS Tool. Technical report, Xilinx, September 2012Google Scholar

Copyright information

© Springer International Publishing AG 2016

Authors and Affiliations

  • Robert Stewart
    • 1
  • Greg Michaelson
    • 1
  • Deepayan Bhowmik
    • 2
  • Paulo Garcia
    • 2
  • Andy Wallace
    • 2
  1. 1.School of Mathematical and Computer SciencesHeriot-Watt UniversityEdinburghUK
  2. 2.School of Engineering and Physical SciencesHeriot-Watt UniversityEdinburghUK

Personalised recommendations