Source-to-Source Optimization for HLS

Cong, Jason; Huang, Muhuan; Pan, Peichen; Wang, Yuxin; Zhang, Peng

doi:10.1007/978-3-319-26408-0_8

Jason Cong^4,5,
Muhuan Huang⁴,
Peichen Pan⁴,
Yuxin Wang⁴ &
…
Peng Zhang⁴

3735 Accesses
23 Citations

Abstract

This chapter describes the source code optimization techniques and automation tools for FPGA design with high-level synthesis (HLS) design flow. HLS has lifted the design abstraction from RTL to C/C++, but in practice extensive source code rewriting is often required to achieve a good design using HLS—especially when the design space is too large to determine the proper design options in advance. In addition, this code rewriting requires not only the knowledge of hardware microarchitecture design, but also familiarity with the coding style for the high-level synthesis tools. Automatic source-to-source transformation techniques have been applied in software compilation and optimization for a long time. They can also greatly benefit the FPGA accelerator design in a high-level synthesis design flow. In general, source-to-source optimization for FPGA will be much more complex and challenging than that for CPU software because of the much larger design space in microarchitecture choices combined with temporal/spatial resource allocation. The goal of source-to-source transformation is to reduce or eliminate the design abstraction gap between software/algorithm development and existing HLS design flows. This will enable the fully automated FPGA design flows for software developers, which is especially important for deploying FPGAs in data centers, so that many software developers can efficiently use FPGAs with minimal effort for acceleration.

This work was performed while the author J. Cong served as the Chief Scientific Advisor of Falcon Computing Solutions Inc.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

S. Aditya, V. Kathail, Algorithmic synthesis using PICO: an integrated framework for application engine synthesis and verification from high level C algorithms, High-Level Synthesis: From Algorithm to Digital Circuit, Springer Netherlands, 2008, Chap. 4, pp. 53–74.
Google Scholar
C. Bastoul. Code generation in the polyhedral model is easier than you think. In Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques, pages 7–16. IEEE Computer Society, 2004.
Google Scholar
C. M. Bishop. Pattern Recognition and Machine Learning (Information Science and Statistics). Springer-Verlag New York, Inc., Secaucus, NJ, USA, 2006.
MATH Google Scholar
A. Cilardo and L. Gallo. Improving multibank memory access parallelism with lattice-based partitioning. ACM Trans. Archit. Code Optim., 11(4):45:1–45:25, January 2015.
Google Scholar
J. Cong, M. Huang, B. Liu, P. Zhang, and Y. Zou. Combining module selection and replication for throughput-driven streaming programs. In Proceedings of the Conference on Design, Automation and Test in Europe, DATE ’12, pages 1018–1023, San Jose, CA, USA, 2012. EDA Consortium.
Google Scholar
J. Cong, M. Huang, and P. Zhang. Combining computation and communication optimizations in system synthesis for streaming applications. In Proceedings of the 2014 ACM/SIGDA International Symposium on Field-programmable Gate Arrays, FPGA ’14, pages 213–222, New York, NY, USA, 2014. ACM.
Google Scholar
J. Cong, W. Jiang, B. Liu, and Y. Zou. Automatic memory partitioning and scheduling for throughput and power optimization. In Proceedings of the 2009 International Conference on Computer-Aided Design, ICCAD ’09, pages 697–704, New York, NY, USA, 2009. ACM.
Google Scholar
J. Cong, W. Jiang, B. Liu, and Y. Zou. Automatic memory partitioning and scheduling for throughput and power optimization. ACM Transactions on Design Automation of Electronic Systems (TODAES), 16(2):15, 2011.
Google Scholar
J. Cong, B. Liu, S. Neuendorffer, J. Noguera, K. Vissers, and Z. Zhang. High-level synthesis for FPGAs: From prototyping to deployment. Computer-Aided Design of Integrated Circuits and Systems, IEEE Transactions on, 30(4):473–491, 2011.
Article Google Scholar
J. Cong, P. Zhang, and Y. Zou. Optimizing memory hierarchy allocation with loop transformations for high-level synthesis. In Proceedings of the 49th Annual Design Automation Conference, pages 1233–1238. ACM, 2012.
Google Scholar
P. Feautrier. Some efficient solutions to the affine scheduling problem. part ii. multidimensional time. International journal of parallel programming, 21(6):389–420, 1992.
Google Scholar
S. Gupta, R. K. Gupta, N. D. Dutt, and A. Nicolau. Coordinated parallelizing compiler optimizations and high-level synthesis. ACM Trans. Des. Autom. Electron. Syst., 9(4):441–470, October 2004.
Article Google Scholar
A. Hagiescu, W.-F. Wong, D. F. Bacon, and R. Rabbah. A computing origami: Folding streams in FPGAs. In Design Automation Conference, 2009. DAC’09. 46th ACM/IEEE, pages 282–287. IEEE, 2009.
Google Scholar
LLVM. LLVM - Low Level Virtual Machine, 2015. http://www.llvm.org [Online; accessed 1-April].
OpenAcc. OpenACC directives for accelerators, 2015. http://www.openacc-standard.org/ [Online; accessed 4-August].
OpenMP. The OpenMP API specification for parallel programming, 2015. http://openmp.org/ [Online; accessed 4-August].
L.-N. Pouchet. Interative Optimization in the Polyhedral Model. PhD thesis, University of Paris-Sud 11, Orsay, France, January 2010.
Google Scholar
N. K. Pham, A. K. Singh, A. Kumar, and M. M. A. Khin. Exploiting loop-array dependencies to accelerate the design space exploration with high level synthesis. In Proceedings of the 2015 Design, Automation & Test in Europe Conference & Exhibition, pages 157–162. EDA Consortium, 2015.
Google Scholar
L.-N. Pouchet, P. Zhang, P. Sadayappan, and J. Cong. Polyhedral-based data reuse optimization for configurable computing. In Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays, pages 29–38. ACM, 2013.
Google Scholar
B. C. Schafer and K. Wakabayashi. Design space exploration acceleration through operation clustering. Computer-Aided Design of Integrated Circuits and Systems, IEEE Transactions on, 29(1):153–157, 2010.
Article Google Scholar
B. C. Schafer and K. Wakabayashi. Divide and conquer high-level synthesis design space exploration. ACM Trans. Des. Autom. Electron. Syst., 17(3):29:1–29:19, July 2012.
Google Scholar
F. Winterstein, S. Bayliss, and G. A. Constantinides. Separation logic-assisted code transformations for efficient high-level synthesis. In Field-Programmable Custom Computing Machines (FCCM), 2014 IEEE 22nd Annual International Symposium on, pages 1–8. IEEE, 2014.
Google Scholar
F. Winterstein, K. Fleming, H.-J. Yang, S. Bayliss, and G. Constantinides. Matchup: Memory abstractions for heap manipulating programs. In Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, pages 136–145. ACM, 2015.
Google Scholar
M. E. Wolf and M. S. Lam. A loop transformation theory and an algorithm to maximize parallelism. Parallel and Distributed Systems, IEEE Transactions on, 2(4):452–471, 1991.
Article Google Scholar
Y. Wang, P. Li, and J. Cong. Theory and algorithm for generalized memory partitioning in high-level synthesis. In Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays, pages 199–208. ACM, 2014.
Google Scholar
Y. Wang, P. Li, P. Zhang, C. Zhang, and J. Cong. Memory partitioning for multidimensional arrays in high-level synthesis. In Proceedings of the 50th Annual Design Automation Conference, page 12. ACM, 2013.
Google Scholar
Y. Wang, P. Zhang, X. Cheng, and J. Cong. An integrated and automated memory optimization flow for FPGA behavioral synthesis. In Design Automation Conference (ASP-DAC), 2012 17th Asia and South Pacific, pages 257–262. IEEE, 2012.
Google Scholar
H. Yang, K. Fleming, M. Adler, and J. Emer. LEAP shared memories: Automating the construction of FPGA coherent memories. In 2014 Symposium on Field-Programmable Custom Computing Machines, pages 117–124. IEEE, 2014.
Google Scholar
Z. Zhang, Y. Fan, W. Jiang, G. Han, C. Yang, and J. Cong. AutoPilot: A platform-based ESL synthesis system. In High-Level Synthesis, pages 99–112. Springer, 2008.
Google Scholar
W. Zuo, P. Li, D. Chen, L.-N. Pouchet, S. Zhong, and J. Cong. Improving polyhedral code generation for high-level synthesis. In Proceedings of the Ninth IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis, page 15. IEEE Press, 2013.
Google Scholar

Download references

Acknowledgements

This work is partially supported by the National Science Foundation Small Business Innovation Research (SBIR) Grant No. 1520449 for project entitled “Customized Computing for Big Data Applications”.

Author information

Authors and Affiliations

Falcon Computing Solutions, Inc., Los Angeles, CA, USA
Jason Cong, Muhuan Huang, Peichen Pan, Yuxin Wang & Peng Zhang
Computer Science Department, University of California, Los Angeles, CA, USA
Jason Cong

Authors

Jason Cong
View author publications
You can also search for this author in PubMed Google Scholar
Muhuan Huang
View author publications
You can also search for this author in PubMed Google Scholar
Peichen Pan
View author publications
You can also search for this author in PubMed Google Scholar
Yuxin Wang
View author publications
You can also search for this author in PubMed Google Scholar
Peng Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jason Cong .

Editor information

Editors and Affiliations

The University of Manchester, Manchester, United Kingdom
Dirk Koch
Erlangen-Nürnberg, Friedrich-Alexander University, Erlangen, Germany
Frank Hannig
Dept. of Computer Science, Hardware/Software Co-Design, Erlangen, Germany
Daniel Ziener

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Cong, J., Huang, M., Pan, P., Wang, Y., Zhang, P. (2016). Source-to-Source Optimization for HLS. In: Koch, D., Hannig, F., Ziener, D. (eds) FPGAs for Software Programmers. Springer, Cham. https://doi.org/10.1007/978-3-319-26408-0_8

Download citation

DOI: https://doi.org/10.1007/978-3-319-26408-0_8
Published: 18 June 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-26406-6
Online ISBN: 978-3-319-26408-0
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics