Abstract
Iterative stencil computations are present in many scientific and engineering applications. The acceleration of stencil codes using parallel architectures has been widely studied. The parallelization of the stencil computation on FPGA based heterogeneous architectures has been reported with the use of traditional RTL logic design or the use of directives in C/C++ codes on high level synthesis tools. In both cases, it has been shown that FPGAs provide better performance per watt compared to CPU or GPU-based systems. High level synthesis tools are limited to the use of parallelization directives without evaluating other possibilities of their application based on the adaptation of the algorithm. In this document, it is proposed a division of the inner loop of the stencil-based code in such a way that total latency is reduced using memory partition and pipeline directives. As a case study is used the two-dimensional Laplace equation implemented on a ZedBoard and an Ultra96 board using Vivado HLS. The performance is evaluated according to the amount of inner loop divisions and the on-chip memory partitions, in terms of the latency, power consumption, use of FPGA resources, and speed-up.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Castano, L., Osorio, G.: An approach to the numerical solution of one-dimensional heat equation on SoC FPGA. Revista Científica de Ingeniería Electrónica, Automática y Comunicaciones 38(2), 83–93 (2017). ISSN 1815–5928
Cattaneo, R., Natale, G., Sicignano, C., Sciuto, D., Santambrogio, M.D.: On how to accelerate iterative stencil loops: a scalable streaming-based approach. ACM Trans. Archit. Code Optim. (TACO) 12(4), 53 (2016)
Chugh, N., Vasista, V., Purini, S., Bondhugula, U.: A DSL compiler for accelerating image processing pipelines on FPGAs. In: 2016 International Conference on Parallel Architecture and Compilation Techniques (PACT), pp. 327–338. IEEE (2016)
Cong, J., Li, P., Xiao, B., Zhang, P.: An optimal microarchitecture for stencil computation acceleration based on non-uniform partitioning of data reuse buffers. In: Proceedings of the 51st Annual Design Automation Conference, pp. 1–6. ACM (2014)
Deest, G., Estibals, N., Yuki, T., Derrien, S., Rajopadhye, S.: Towards scalable and efficient FPGA stencil accelerators. In: IMPACT 2016 - 6th International Workshop on Polyhedral Compilation Techniques, Held with HIPEAC 2016 (2016)
Deest, G., Yuki, T., Rajopadhye, S., Derrien, S.: One size does not fit all: implementation trade-offs for iterative stencil computations on FPGAs. In: 2017 27th International Conference on Field Programmable Logic and Applications (FPL), pp. 1–8. IEEE (2017)
Del Sozzo, E., Baghdadi, R., Amarasinghe, S., Santambrogio, M.D.: A common backend for hardware acceleration on FPGA. In: 2017 IEEE International Conference on Computer Design (ICCD), pp. 427–430. IEEE (2017)
Escobedo, J., Lin, M.: Graph-theoretically optimal memory banking for stencil-based computing kernels. In: Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, pp. 199–208. ACM (2018)
de Fine Licht, J., Blott, M., Hoefler, T.: Designing scalable FPGA architectures using high-level synthesis. In: Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP 2018), vol. 53, pp. 403–404. ACM (2018)
Kobayashi, R., Oobata, Y., Fujita, N., Yamaguchi, Y., Boku, T.: OpenCL-ready high speed FPGA network for reconfigurable high performance computing. In: Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region, pp. 192–201. ACM (2018)
László, E., Nagy, Z., Giles, M.B., Reguly, I., Appleyard, J., Szolgay, P.: Analysis of parallel processor architectures for the solution of the Black-Scholes PDE. In: 2015 IEEE International Symposium on Circuits and Systems (ISCAS), pp. 1977–1980. IEEE (2015)
Liu, J., Bayliss, S., Constantinides, G.A.: Offline synthesis of online dependence testing: parametric loop pipelining for HLS. In: 2015 IEEE 23rd Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM), pp. 159–162. IEEE (2015)
Liu, J., Wickerson, J., Bayliss, S., Constantinides, G.A.: Polyhedral-baseddynamic loop pipelining for high-level synthesis. IEEE Trans. Comput.-Aided Des. Integr. Circ. Syst. 37, 1802–1815 (2017)
Liu, J., Wickerson, J., Constantinides, G.A.: Loop splitting for efficient pipelining in high-level synthesis. In: 2016 IEEE 24th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM), pp. 72–79. IEEE (2016)
Mokhov, A., et al.: Language and hardware acceleration backend for graph processing. In: 2017 Forum on Specification and Design Languages (FDL), pp. 1–7. IEEE (2017)
Mondigo, A., Ueno, T., Tanaka, D., Sano, K., Yamamoto, S.: Design and scalability analysis of bandwidth-compressed stream computing with multiple FPGAs. In: 2017 12th International Symposium on Reconfigurable Communication-centric Systems-on-Chip (ReCoSoC), pp. 1–8. IEEE (2017)
Nacci, A.A., Rana, V., Bruschi, F., Sciuto, D., Beretta, I., Atienza, D.: A high-level synthesis flow for the implementation of iterative stencil loop algorithms on FPGA devices. In: Proceedings of the 50th Annual Design Automation Conference, p. 52. ACM (2013)
Natale, G., Stramondo, G., Bressana, P., Cattaneo, R., Sciuto, D., Santambrogio, M.D.: A polyhedral model-based framework for dataflow implementation on FPGA devices of iterative stencil loops. In: 2016 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), pp. 1–8. IEEE (2016)
de Oliveira, C.B., Cardoso, J.M., Marques, E.: High-level synthesis from C vs. a DSL-based approach. In: 2014 IEEE International Parallel & Distributed Processing Symposium Workshops, pp. 257–262. IEEE (2014)
Reagen, B., Adolf, R., Shao, Y.S., Wei, G.Y., Brooks, D.: Machsuite: benchmarks for accelerator design and customized architectures. In: 2014 IEEE International Symposium on Workload Characterization (IISWC), pp. 110–119. IEEE (2014)
Reiche, O., Özkan, M.A., Hannig, F., Teich, J., Schmid, M.: Loop parallelization techniques for FPGA accelerator synthesis. J. Signal Process. Syst. 90(1), 3–27 (2018)
Sakai, R., Sugimoto, N., Miyajima, T., Fujita, N., Amano, H.: Acceleration of full-pic simulation on a CPU-FPGA tightly coupled environment. In: 2016 IEEE 10th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC), pp. 8–14. IEEE (2016)
Sano, K., Hatsuda, Y., Yamamoto, S.: Multi-FPGA accelerator for scalable stencil computation with constant memory bandwidth. IEEE Trans. Parallel Distrib. Syst. 25(3), 695–705 (2014)
Schmid, M., Reiche, O., Schmitt, C., Hannig, F., Teich, J.: Code generation for high-level synthesis of multiresolution applications on FPGAs. arXiv preprint arXiv:1408.4721 (2014)
Shao, Y.S., Reagen, B., Wei, G.Y., Brooks, D.: Aladdin: a pre-RTL, power-performance accelerator simulator enabling large design space exploration of customized architectures. In: ACM SIGARCH Computer Architecture News, vol. 42, pp. 97–108. IEEE Press (2014)
Waidyasooriya, H.M., Takei, Y., Tatsumi, S., Hariyama, M.: Opencl-based FPGA-platform for stencil computation and its optimization methodology. IEEE Trans. Parallel Distrib. Syst. 28(5), 1390–1402 (2017)
Wang, S., Liang, Y.: A comprehensive framework for synthesizing stencil algorithms on FPGAs using OpenCL model. In: 2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC), pp. 1–6. IEEE (2017)
Zohouri, H.R., Podobas, A., Matsuoka, S.: Combined spatial and temporal blocking for high-performance stencil computation on FPGAs using OpenCL. In: Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, pp. 153–162. ACM (2018)
Acknowledgements
This study were supported by the AE&CC research Group COL0053581, at the Sistemas de Control y Robótica Laboratory, attached to the Instituto Tecnológico Metropolitano. This work is part of the project “Improvement of visual perception in humanoid robots for objects recognition in natural environments using Deep Learning” with ID P17224, co-funded by the Instituto Tecnológico Metropolitano and Universidad de Antioquia.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Castano-Londono, L., Alzate Anzola, C., Marquez-Viloria, D., Gallo, G., Osorio, G. (2019). Evaluation of Stencil Based Algorithm Parallelization over System-on-Chip FPGA Using a High Level Synthesis Tool. In: Figueroa-García, J., Duarte-González, M., Jaramillo-Isaza, S., Orjuela-Cañon, A., Díaz-Gutierrez, Y. (eds) Applied Computer Sciences in Engineering. WEA 2019. Communications in Computer and Information Science, vol 1052. Springer, Cham. https://doi.org/10.1007/978-3-030-31019-6_5
Download citation
DOI: https://doi.org/10.1007/978-3-030-31019-6_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-31018-9
Online ISBN: 978-3-030-31019-6
eBook Packages: Computer ScienceComputer Science (R0)