Tools and Techniques for Implementation of Real-time Video Processing Algorithms
- 131 Downloads
This paper describes flexible tools and techniques that can be used to efficiently design/generate quite a variety of hardware IP blocks for highly parameterized real-time video processing algorithms. The tools and techniques discussed in the paper include host software, FPGA interface IP (PCIe, USB 3.0, DRAM), high-level synthesis, RTL generation tools, synthesis automation as well as architectural concepts (e.g., nested pipelining), an architectural estimation tool, and verification methodology. The paper also discusses a specific use case to deploy the mentioned tools and techniques for hardware design of an optical flow algorithm. The paper shows that in a fairly short amount of time, we were able to implement 11 versions of the optical flow algorithm running on 3 different FPGAs (from 2 different vendors), while we generated and synthesized several thousand designs for architectural trade-off.
KeywordsHardware IP generation Real-time video processing High-level synthesis FPGA Optical flow Nested pipelining
This work was supported by the Scientific and Technological Research Council of Turkey (TÜBİTAK) through project no. 114E343 as well as European Union’s Artemis Joint Undertaking as part of project named ALMARVI (Grant Agreement 621439).
- 1.Wang, R., Hamilton, T.J., Tapson, J., van Schai, A. (2014). An FPGA design framework for large-scale spiking neural networks. In Proceedings of IEEE international symposium on circuits and systems (ISCAS).Google Scholar
- 2.Nurvitadhi, E., Weisz, G., Wang, Y., Hurkat, S., Nguyen, M., Hoe, J.C., Martínez, J. F., Guestrin, C. (2014). Graphgen: An FPGA framework for vertex-centric graph computation. In Proceedings of IEEE international symposium on field-programmable custom computing machines (FCCM).Google Scholar
- 3.SDSoC development environment. https://www.xilinx.com/products/design-tools/software-zone/sdsoc.html. Accessed: 2017-11-12.
- 4.SoCEDS getting started. http://www.alterawiki.com/wiki/SoCEDSGettingStarted. Accessed: 2017-11-12.
- 5.Liu, C., Ng, H. -C., So, H.K.-H. (2015). QuickDough: A rapid FPGA loop accelerator design framework using soft CGRA overlay. In Proceedings of international conference on field programmable technology (FPT).Google Scholar
- 6.Fowers, J., Liu, J., Stitt, G. (2014). A framework for dynamic parallelization of FPGA-accelerated applications. In Proceedings of international workshop on software and compilers for embedded systems (SCOPES).Google Scholar
- 7.Kalms, L., & Göhringer, D. (2017). Exploration of OpenCL for FPGAs using SDAccel and comparison to GPUs and multicore CPUs. In Proceedings of international conference on field programmable logic and applications (FPL).Google Scholar
- 8.Intel FPGA SDK for OpenCL. https://www.altera.com/products/design-software/embedded-software-developers/opencl/overview.html. Accessed: 2017-11-12.
- 9.Ugurdag, H.F. (2013). Experiences on the road from EDA developer to designer to educator. In Proceedings of east-west design & test symposium (EWDTS).Google Scholar
- 12.Philip, J.T., Samuvel, B., Pradeesh, K., Nimmi, N.K. (2014). Rimcom: Raster-order image compressor for embedded video applications. In Proceedings of international conference on emerging research areas: magnetics, machines and drives (AICERA/iCMMD).Google Scholar
- 13.Middlebury optical flowv benchmark database. http://vision.middlebury.edu/flow/eval/results/results-e1.php. Accessed: 2017-11-12.
- 14.Werlberger, M., Trobin, W., Pock, T., Wedel, A., Cremers, D., Bischof, H. (2009). Anisotropic Huber-L1 Optical Flow. In Proceedings of British machine vision conference (BMVC).Google Scholar
- 15.Büyükaydın, D., & Akgün, T. (2015). GPU implementation of an anisotropic Huber-L1 dense optical flow algorithm using OpenCL. In Proceedings of international conference on embedded computer systems: Architectures, modeling, and simulation (SAMOS).Google Scholar
- 16.Levent, V.E. (2015). FPGA based hardware platform for video processing, Master’s thesis, Yildiz Technical University. YOK Thesis No: 406560.Google Scholar
- 17.Hwang, C.-T., Hsu, Y.-C., Lin, Y.-L. (1991). Scheduling for functional pipelining and loop winding. In Proceedings of ACM/IEEE design automation conference (DAC), pp. 764–769.Google Scholar
- 18.Parhi, K.K. (1999). VLSI digital signal processing systems: Design and implementation. New York: Wiley.Google Scholar
- 19.Guzel, A.E., Levent, V., Tosun, M., Ozkan, M.A., Akgun, T., Erbas, C., Ugurdag, H.F. (2016). Using high-level synthesis for rapid design of video processing pipes. In Proceedings of East-West Design & Test Symposium (EWDTS).Google Scholar
- 20.Vivado design suite user guide: High level synthesis (ug902 v2014.3). http://www.xilinx.com/support/documentation/sw_manuals/xilinx2014_3/ug902-vivado-high-level-synthesis.pdf. Accessed: 2017-11-12.
- 21.Buyukmihci, M., Levent, V., Guzel, A., Ates, O., Tosun, M., Akgun, T., Erbas, C., Goren, S., Ugurdag, H. (2016). Output domain downscaler. In Proceedings of international symposium on computer and information sciences (ISCIS).Google Scholar
- 22.Jacobsen, M., Richmond, D., Hogains, N., Kastner, R. (2015). Riffa 2.1: A reusable integration framework for fpga accelerators. In ACM Transactions on Reconfigurable Technology and Systems (TRETS), vol. 8, no. 4, article no. 22.Google Scholar
- 25.Janik, I., Tang, Q., Khalid, M. (2015). An overview of altera sdk for opencl: A user perspective. In Proceedings of IEEE Canadian conference on electrical and computer engineering (CCECE), pp. 559–564.Google Scholar