Skip to main content

TAPRE-HBM: Trace-Based Processor Rapid Emulation Using HBM on FPGAs

  • Conference paper
  • First Online:
Applied Reconfigurable Computing. Architectures, Tools, and Applications (ARC 2023)

Abstract

The use of High Bandwidth Memory (HBM) is one way to solve the bottleneck of memory bandwidth limitation. Furthermore, the integration of HBM memories in Field Programmable Gate Arrays (FPGA) now also makes it possible to use this memory technology in a wide range of applications and even in embedded systems. Nevertheless, the use of HBM poses major challenges for architecture development. In addition to highly parallel access, high latencies must be hidden. Furthermore, the partitioning of the data and the bus structure play a decisive role. Finally, memory controller implementations are mostly vendor specific making it difficult to predict the exact performance of the memory subsystem. In this paper, we present TAPRE-HBM, an FPGA-based rapid prototyping platform for analyzing computer architectures with HBM memory backends. The goal of this work is to evaluate and assess the impact of particular memory access patterns. As these patterns are an emerging property of the architecture and application, such traces can be created by simulating the target computer architectures which should use the HBM memory subsystem without the need for a specific implementation or integration. Any incurred latency will be revealed by this method, even if only a system-level model exists. Using the FPGA-based rapid prototyping platform, performance predictions can be made and thus it can be determined whether the selected target architecture or software running on the target architecture is suitable for use with HBM memories. The proposed platform is analyzed using a vector processor as an example and present various optimizations to increase the memory bandwidth. Compared to other works, a high number of memory transactions can be simulated on real hardware, with a high memory interface frequency and arbitrary delays between transactions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 74.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. AMD / Xilinx: Alveo U55C high performance compute card. https://www.xilinx.com/products/boards-and-kits/alveo/u55c.html, Accessed: 15 May 2023

  2. AMD / Xilinx: AXI high bandwidth memory controller v1.0, 2021

    Google Scholar 

  3. Binkert, N., et al.: The Gem5 simulator. SIGARCH Comput. Archit. News 39(2), 1–7 (2011). https://doi.org/10.1145/2024716.2024718

  4. Hassan, H., et al.: SoftMC: A flexible and practical open-source infrastructure for enabling experimental DRAM studies. In: 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA). pp. 241–252 (2017). https://doi.org/10.1109/HPCA.2017.62

  5. Holzinger, P., Reiser, D., Hahn, T., Reichenbach, M.: Fast HBM Access with FPGAs: analysis, architectures, and applications. In: 2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW). pp. 152–159. IEEE, Portland, OR, USA (2021). https://doi.org/10.1109/IPDPSW52791.2021.00030

  6. Huang, H., et al.: Shuhai: a tool for benchmarking high bandwidth memory on FPGAs. IEEE Trans. Comput. 71(5), 1133–1144 (2022). https://doi.org/10.1109/TC.2021.3075765

    Article  MathSciNet  MATH  Google Scholar 

  7. Huang, R., Pedoeem, J., Chen, C.: YOLO-LITE: a real-time object Detection algorithm optimized for Non-GPU computers. In: 2018 IEEE International Conference on Big Data (Big Data). pp. 2503–2510 (Dec 2018). https://doi.org/10.1109/BigData.2018.8621865

  8. Iskandar, V., Ghany, M.A.A.E., Göhringer, D.: Near-memory computing on FPGAs with 3D-Stacked memories: applications, architectures, and optimizations. ACM Trans. Reconfigurable Technol. Syst. 16(1), 1–32 (2022). https://doi.org/10.1145/3547658

  9. Jain, A.K., Kumar, S., Tripathi, A., Gaitonde, D.: Sparse deep neural network acceleration on HBM-Enabled FPGA platform. In: 2021 IEEE High Performance Extreme Computing Conference (HPEC). pp. 1–7 (2021). DOI: https://doi.org/10.1109/HPEC49654.2021.9622804

  10. Jain, A.K., Lloyd, S., Gokhale, M.: Microscope on memory: MPSoC-enabled computer memory system assessments. In: 2018 IEEE 26th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM). pp. 173–180 (2018). https://doi.org/10.1109/FCCM.2018.00035

  11. JEDEC: Standard high bandwidth memory (HBM) DRAM specification. https://www.xilinx.com/products/boards-and-kits/alveo/u55c.html (2015)

  12. Jun, H., et al.: HBM (High Bandwidth Memory) DRAM Technology and Architecture. In: 2017 IEEE International Memory Workshop (IMW). pp. 1–4. IEEE, Monterey, CA, USA (2017). https://doi.org/10.1109/IMW.2017.7939084

  13. Kim, N.S., Chen, D., Xiong, J., Hwu, W.m.W.: Heterogeneous Computing Meets Near-Memory Acceleration and High-Level Synthesis in the Post-Moore Era. IEEE Micro 37(4), 10–18 (2017). https://doi.org/10.1109/MM.2017.3211105

  14. Kim, Y., Yang, W., Mutlu, O.: Ramulator: a fast and extensible dram simulator. IEEE Comput. Archit. Lett. 15(1), 45–49 (2016). https://doi.org/10.1109/LCA.2015.2414456

    Article  Google Scholar 

  15. Lee, J.C., et al.: High bandwidth memory(HBM) with TSV technique. In: 2016 International SoC Design Conference (ISOCC). pp. 181–182 (2016). https://doi.org/10.1109/ISOCC.2016.7799847

  16. Li, S., Yang, Z., Reddy, D., Srivastava, A., Jacob, B.: DRAMsim3: a cycle-accurate, thermal-capable DRAM simulator. IEEE Comput. Archit. Lett. 19(2), 106–109 (2020). https://doi.org/10.1109/LCA.2020.2973991

    Article  Google Scholar 

  17. Nguyen, V.C., Nakashima, Y.: Analysis of fully-pipelined CNN implementation on FPGA and HBM2. In: 2021 Ninth International Symposium on Computing and Networking Workshops (CANDARW). pp. 134–137 (2021). https://doi.org/10.1109/CANDARW53999.2021.00029

  18. Nolting, S., Giesemann, F., Hartig, J., Schmider, A., Paya-Vaya, G.: Application-specific soft-core vector processor for advanced driver assistance systems. In: 2017 27th International Conference on Field Programmable Logic and Applications (FPL). pp. 1–2 (Sep 2017). https://doi.org/10.23919/FPL.2017.8056836

  19. Samsung: HBM3 Icebolt. https://semiconductor.samsung.com/dram/hbm/hbm3-icebolt/, Accessed 14 May 2023

  20. Shi, R., Kara, K., Hagleitner, C., Diamantopoulos, D., Syrivelis, D., Alonso, G.: Exploiting HBM on FPGAs for data processing. ACM Trans. Reconfigurable Technol. Syst. 15(4), 1–27 (2022). https://doi.org/10.1145/3491238

  21. Thieu, G.B., et al.: ZuSE Ki-Avf: application-specific AI processor for intelligent sensor signal processing in autonomous driving. In: 2023 Design, Automation & Test in Europe Conference & Exhibition (DATE). pp. 1–6 (2023). https://doi.org/10.23919/DATE56975.2023.10136978

  22. Wang, Z., Huang, H., Zhang, J., Alonso, G.: Shuhai: benchmarking High Bandwidth Memory On FPGAS. In: 2020 IEEE 28th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM). pp. 111–119. IEEE, Fayetteville, AR, USA (2020). https://doi.org/10.1109/FCCM48280.2020.00024

  23. Zhang, J., Zha, Y., Beckwith, N., Liu, B., Li, J.: MEG: A RISCV-based system emulation infrastructure for near-data processing using FPGAs and high-bandwidth memory. ACM Trans. Reconfigurable Technol. Syst. 13(4), 1–24 (2020). https://doi.org/10.1145/3409114

Download references

Acknowledgement

The authors would like to thank AMD for the provided hardware and software under the Xilinx University Program.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Johannes Knödtel .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Knödtel, J. et al. (2023). TAPRE-HBM: Trace-Based Processor Rapid Emulation Using HBM on FPGAs. In: Palumbo, F., Keramidas, G., Voros, N., Diniz, P.C. (eds) Applied Reconfigurable Computing. Architectures, Tools, and Applications. ARC 2023. Lecture Notes in Computer Science, vol 14251. Springer, Cham. https://doi.org/10.1007/978-3-031-42921-7_21

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-42921-7_21

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-42920-0

  • Online ISBN: 978-3-031-42921-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics