Skip to main content

Evaluation of Stencil Based Algorithm Parallelization over System-on-Chip FPGA Using a High Level Synthesis Tool

  • Conference paper
  • First Online:
Applied Computer Sciences in Engineering (WEA 2019)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1052))

Included in the following conference series:

  • 1330 Accesses

Abstract

Iterative stencil computations are present in many scientific and engineering applications. The acceleration of stencil codes using parallel architectures has been widely studied. The parallelization of the stencil computation on FPGA based heterogeneous architectures has been reported with the use of traditional RTL logic design or the use of directives in C/C++ codes on high level synthesis tools. In both cases, it has been shown that FPGAs provide better performance per watt compared to CPU or GPU-based systems. High level synthesis tools are limited to the use of parallelization directives without evaluating other possibilities of their application based on the adaptation of the algorithm. In this document, it is proposed a division of the inner loop of the stencil-based code in such a way that total latency is reduced using memory partition and pipeline directives. As a case study is used the two-dimensional Laplace equation implemented on a ZedBoard and an Ultra96 board using Vivado HLS. The performance is evaluated according to the amount of inner loop divisions and the on-chip memory partitions, in terms of the latency, power consumption, use of FPGA resources, and speed-up.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Castano, L., Osorio, G.: An approach to the numerical solution of one-dimensional heat equation on SoC FPGA. Revista Científica de Ingeniería Electrónica, Automática y Comunicaciones 38(2), 83–93 (2017). ISSN 1815–5928

    Google Scholar 

  2. Cattaneo, R., Natale, G., Sicignano, C., Sciuto, D., Santambrogio, M.D.: On how to accelerate iterative stencil loops: a scalable streaming-based approach. ACM Trans. Archit. Code Optim. (TACO) 12(4), 53 (2016)

    Google Scholar 

  3. Chugh, N., Vasista, V., Purini, S., Bondhugula, U.: A DSL compiler for accelerating image processing pipelines on FPGAs. In: 2016 International Conference on Parallel Architecture and Compilation Techniques (PACT), pp. 327–338. IEEE (2016)

    Google Scholar 

  4. Cong, J., Li, P., Xiao, B., Zhang, P.: An optimal microarchitecture for stencil computation acceleration based on non-uniform partitioning of data reuse buffers. In: Proceedings of the 51st Annual Design Automation Conference, pp. 1–6. ACM (2014)

    Google Scholar 

  5. Deest, G., Estibals, N., Yuki, T., Derrien, S., Rajopadhye, S.: Towards scalable and efficient FPGA stencil accelerators. In: IMPACT 2016 - 6th International Workshop on Polyhedral Compilation Techniques, Held with HIPEAC 2016 (2016)

    Google Scholar 

  6. Deest, G., Yuki, T., Rajopadhye, S., Derrien, S.: One size does not fit all: implementation trade-offs for iterative stencil computations on FPGAs. In: 2017 27th International Conference on Field Programmable Logic and Applications (FPL), pp. 1–8. IEEE (2017)

    Google Scholar 

  7. Del Sozzo, E., Baghdadi, R., Amarasinghe, S., Santambrogio, M.D.: A common backend for hardware acceleration on FPGA. In: 2017 IEEE International Conference on Computer Design (ICCD), pp. 427–430. IEEE (2017)

    Google Scholar 

  8. Escobedo, J., Lin, M.: Graph-theoretically optimal memory banking for stencil-based computing kernels. In: Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, pp. 199–208. ACM (2018)

    Google Scholar 

  9. de Fine Licht, J., Blott, M., Hoefler, T.: Designing scalable FPGA architectures using high-level synthesis. In: Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP 2018), vol. 53, pp. 403–404. ACM (2018)

    Google Scholar 

  10. Kobayashi, R., Oobata, Y., Fujita, N., Yamaguchi, Y., Boku, T.: OpenCL-ready high speed FPGA network for reconfigurable high performance computing. In: Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region, pp. 192–201. ACM (2018)

    Google Scholar 

  11. László, E., Nagy, Z., Giles, M.B., Reguly, I., Appleyard, J., Szolgay, P.: Analysis of parallel processor architectures for the solution of the Black-Scholes PDE. In: 2015 IEEE International Symposium on Circuits and Systems (ISCAS), pp. 1977–1980. IEEE (2015)

    Google Scholar 

  12. Liu, J., Bayliss, S., Constantinides, G.A.: Offline synthesis of online dependence testing: parametric loop pipelining for HLS. In: 2015 IEEE 23rd Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM), pp. 159–162. IEEE (2015)

    Google Scholar 

  13. Liu, J., Wickerson, J., Bayliss, S., Constantinides, G.A.: Polyhedral-baseddynamic loop pipelining for high-level synthesis. IEEE Trans. Comput.-Aided Des. Integr. Circ. Syst. 37, 1802–1815 (2017)

    Article  Google Scholar 

  14. Liu, J., Wickerson, J., Constantinides, G.A.: Loop splitting for efficient pipelining in high-level synthesis. In: 2016 IEEE 24th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM), pp. 72–79. IEEE (2016)

    Google Scholar 

  15. Mokhov, A., et al.: Language and hardware acceleration backend for graph processing. In: 2017 Forum on Specification and Design Languages (FDL), pp. 1–7. IEEE (2017)

    Google Scholar 

  16. Mondigo, A., Ueno, T., Tanaka, D., Sano, K., Yamamoto, S.: Design and scalability analysis of bandwidth-compressed stream computing with multiple FPGAs. In: 2017 12th International Symposium on Reconfigurable Communication-centric Systems-on-Chip (ReCoSoC), pp. 1–8. IEEE (2017)

    Google Scholar 

  17. Nacci, A.A., Rana, V., Bruschi, F., Sciuto, D., Beretta, I., Atienza, D.: A high-level synthesis flow for the implementation of iterative stencil loop algorithms on FPGA devices. In: Proceedings of the 50th Annual Design Automation Conference, p. 52. ACM (2013)

    Google Scholar 

  18. Natale, G., Stramondo, G., Bressana, P., Cattaneo, R., Sciuto, D., Santambrogio, M.D.: A polyhedral model-based framework for dataflow implementation on FPGA devices of iterative stencil loops. In: 2016 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), pp. 1–8. IEEE (2016)

    Google Scholar 

  19. de Oliveira, C.B., Cardoso, J.M., Marques, E.: High-level synthesis from C vs. a DSL-based approach. In: 2014 IEEE International Parallel & Distributed Processing Symposium Workshops, pp. 257–262. IEEE (2014)

    Google Scholar 

  20. Reagen, B., Adolf, R., Shao, Y.S., Wei, G.Y., Brooks, D.: Machsuite: benchmarks for accelerator design and customized architectures. In: 2014 IEEE International Symposium on Workload Characterization (IISWC), pp. 110–119. IEEE (2014)

    Google Scholar 

  21. Reiche, O., Özkan, M.A., Hannig, F., Teich, J., Schmid, M.: Loop parallelization techniques for FPGA accelerator synthesis. J. Signal Process. Syst. 90(1), 3–27 (2018)

    Article  Google Scholar 

  22. Sakai, R., Sugimoto, N., Miyajima, T., Fujita, N., Amano, H.: Acceleration of full-pic simulation on a CPU-FPGA tightly coupled environment. In: 2016 IEEE 10th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC), pp. 8–14. IEEE (2016)

    Google Scholar 

  23. Sano, K., Hatsuda, Y., Yamamoto, S.: Multi-FPGA accelerator for scalable stencil computation with constant memory bandwidth. IEEE Trans. Parallel Distrib. Syst. 25(3), 695–705 (2014)

    Article  Google Scholar 

  24. Schmid, M., Reiche, O., Schmitt, C., Hannig, F., Teich, J.: Code generation for high-level synthesis of multiresolution applications on FPGAs. arXiv preprint arXiv:1408.4721 (2014)

  25. Shao, Y.S., Reagen, B., Wei, G.Y., Brooks, D.: Aladdin: a pre-RTL, power-performance accelerator simulator enabling large design space exploration of customized architectures. In: ACM SIGARCH Computer Architecture News, vol. 42, pp. 97–108. IEEE Press (2014)

    Google Scholar 

  26. Waidyasooriya, H.M., Takei, Y., Tatsumi, S., Hariyama, M.: Opencl-based FPGA-platform for stencil computation and its optimization methodology. IEEE Trans. Parallel Distrib. Syst. 28(5), 1390–1402 (2017)

    Article  Google Scholar 

  27. Wang, S., Liang, Y.: A comprehensive framework for synthesizing stencil algorithms on FPGAs using OpenCL model. In: 2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC), pp. 1–6. IEEE (2017)

    Google Scholar 

  28. Zohouri, H.R., Podobas, A., Matsuoka, S.: Combined spatial and temporal blocking for high-performance stencil computation on FPGAs using OpenCL. In: Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, pp. 153–162. ACM (2018)

    Google Scholar 

Download references

Acknowledgements

This study were supported by the AE&CC research Group COL0053581, at the Sistemas de Control y Robótica Laboratory, attached to the Instituto Tecnológico Metropolitano. This work is part of the project “Improvement of visual perception in humanoid robots for objects recognition in natural environments using Deep Learning” with ID P17224, co-funded by the Instituto Tecnológico Metropolitano and Universidad de Antioquia.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Luis Castano-Londono .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Castano-Londono, L., Alzate Anzola, C., Marquez-Viloria, D., Gallo, G., Osorio, G. (2019). Evaluation of Stencil Based Algorithm Parallelization over System-on-Chip FPGA Using a High Level Synthesis Tool. In: Figueroa-García, J., Duarte-González, M., Jaramillo-Isaza, S., Orjuela-Cañon, A., Díaz-Gutierrez, Y. (eds) Applied Computer Sciences in Engineering. WEA 2019. Communications in Computer and Information Science, vol 1052. Springer, Cham. https://doi.org/10.1007/978-3-030-31019-6_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-31019-6_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-31018-9

  • Online ISBN: 978-3-030-31019-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics