Abstract
The use of High Bandwidth Memory (HBM) is one way to solve the bottleneck of memory bandwidth limitation. Furthermore, the integration of HBM memories in Field Programmable Gate Arrays (FPGA) now also makes it possible to use this memory technology in a wide range of applications and even in embedded systems. Nevertheless, the use of HBM poses major challenges for architecture development. In addition to highly parallel access, high latencies must be hidden. Furthermore, the partitioning of the data and the bus structure play a decisive role. Finally, memory controller implementations are mostly vendor specific making it difficult to predict the exact performance of the memory subsystem. In this paper, we present TAPRE-HBM, an FPGA-based rapid prototyping platform for analyzing computer architectures with HBM memory backends. The goal of this work is to evaluate and assess the impact of particular memory access patterns. As these patterns are an emerging property of the architecture and application, such traces can be created by simulating the target computer architectures which should use the HBM memory subsystem without the need for a specific implementation or integration. Any incurred latency will be revealed by this method, even if only a system-level model exists. Using the FPGA-based rapid prototyping platform, performance predictions can be made and thus it can be determined whether the selected target architecture or software running on the target architecture is suitable for use with HBM memories. The proposed platform is analyzed using a vector processor as an example and present various optimizations to increase the memory bandwidth. Compared to other works, a high number of memory transactions can be simulated on real hardware, with a high memory interface frequency and arbitrary delays between transactions.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
AMD / Xilinx: Alveo U55C high performance compute card. https://www.xilinx.com/products/boards-and-kits/alveo/u55c.html, Accessed: 15 May 2023
AMD / Xilinx: AXI high bandwidth memory controller v1.0, 2021
Binkert, N., et al.: The Gem5 simulator. SIGARCH Comput. Archit. News 39(2), 1–7 (2011). https://doi.org/10.1145/2024716.2024718
Hassan, H., et al.: SoftMC: A flexible and practical open-source infrastructure for enabling experimental DRAM studies. In: 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA). pp. 241–252 (2017). https://doi.org/10.1109/HPCA.2017.62
Holzinger, P., Reiser, D., Hahn, T., Reichenbach, M.: Fast HBM Access with FPGAs: analysis, architectures, and applications. In: 2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW). pp. 152–159. IEEE, Portland, OR, USA (2021). https://doi.org/10.1109/IPDPSW52791.2021.00030
Huang, H., et al.: Shuhai: a tool for benchmarking high bandwidth memory on FPGAs. IEEE Trans. Comput. 71(5), 1133–1144 (2022). https://doi.org/10.1109/TC.2021.3075765
Huang, R., Pedoeem, J., Chen, C.: YOLO-LITE: a real-time object Detection algorithm optimized for Non-GPU computers. In: 2018 IEEE International Conference on Big Data (Big Data). pp. 2503–2510 (Dec 2018). https://doi.org/10.1109/BigData.2018.8621865
Iskandar, V., Ghany, M.A.A.E., Göhringer, D.: Near-memory computing on FPGAs with 3D-Stacked memories: applications, architectures, and optimizations. ACM Trans. Reconfigurable Technol. Syst. 16(1), 1–32 (2022). https://doi.org/10.1145/3547658
Jain, A.K., Kumar, S., Tripathi, A., Gaitonde, D.: Sparse deep neural network acceleration on HBM-Enabled FPGA platform. In: 2021 IEEE High Performance Extreme Computing Conference (HPEC). pp. 1–7 (2021). DOI: https://doi.org/10.1109/HPEC49654.2021.9622804
Jain, A.K., Lloyd, S., Gokhale, M.: Microscope on memory: MPSoC-enabled computer memory system assessments. In: 2018 IEEE 26th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM). pp. 173–180 (2018). https://doi.org/10.1109/FCCM.2018.00035
JEDEC: Standard high bandwidth memory (HBM) DRAM specification. https://www.xilinx.com/products/boards-and-kits/alveo/u55c.html (2015)
Jun, H., et al.: HBM (High Bandwidth Memory) DRAM Technology and Architecture. In: 2017 IEEE International Memory Workshop (IMW). pp. 1–4. IEEE, Monterey, CA, USA (2017). https://doi.org/10.1109/IMW.2017.7939084
Kim, N.S., Chen, D., Xiong, J., Hwu, W.m.W.: Heterogeneous Computing Meets Near-Memory Acceleration and High-Level Synthesis in the Post-Moore Era. IEEE Micro 37(4), 10–18 (2017). https://doi.org/10.1109/MM.2017.3211105
Kim, Y., Yang, W., Mutlu, O.: Ramulator: a fast and extensible dram simulator. IEEE Comput. Archit. Lett. 15(1), 45–49 (2016). https://doi.org/10.1109/LCA.2015.2414456
Lee, J.C., et al.: High bandwidth memory(HBM) with TSV technique. In: 2016 International SoC Design Conference (ISOCC). pp. 181–182 (2016). https://doi.org/10.1109/ISOCC.2016.7799847
Li, S., Yang, Z., Reddy, D., Srivastava, A., Jacob, B.: DRAMsim3: a cycle-accurate, thermal-capable DRAM simulator. IEEE Comput. Archit. Lett. 19(2), 106–109 (2020). https://doi.org/10.1109/LCA.2020.2973991
Nguyen, V.C., Nakashima, Y.: Analysis of fully-pipelined CNN implementation on FPGA and HBM2. In: 2021 Ninth International Symposium on Computing and Networking Workshops (CANDARW). pp. 134–137 (2021). https://doi.org/10.1109/CANDARW53999.2021.00029
Nolting, S., Giesemann, F., Hartig, J., Schmider, A., Paya-Vaya, G.: Application-specific soft-core vector processor for advanced driver assistance systems. In: 2017 27th International Conference on Field Programmable Logic and Applications (FPL). pp. 1–2 (Sep 2017). https://doi.org/10.23919/FPL.2017.8056836
Samsung: HBM3 Icebolt. https://semiconductor.samsung.com/dram/hbm/hbm3-icebolt/, Accessed 14 May 2023
Shi, R., Kara, K., Hagleitner, C., Diamantopoulos, D., Syrivelis, D., Alonso, G.: Exploiting HBM on FPGAs for data processing. ACM Trans. Reconfigurable Technol. Syst. 15(4), 1–27 (2022). https://doi.org/10.1145/3491238
Thieu, G.B., et al.: ZuSE Ki-Avf: application-specific AI processor for intelligent sensor signal processing in autonomous driving. In: 2023 Design, Automation & Test in Europe Conference & Exhibition (DATE). pp. 1–6 (2023). https://doi.org/10.23919/DATE56975.2023.10136978
Wang, Z., Huang, H., Zhang, J., Alonso, G.: Shuhai: benchmarking High Bandwidth Memory On FPGAS. In: 2020 IEEE 28th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM). pp. 111–119. IEEE, Fayetteville, AR, USA (2020). https://doi.org/10.1109/FCCM48280.2020.00024
Zhang, J., Zha, Y., Beckwith, N., Liu, B., Li, J.: MEG: A RISCV-based system emulation infrastructure for near-data processing using FPGAs and high-bandwidth memory. ACM Trans. Reconfigurable Technol. Syst. 13(4), 1–24 (2020). https://doi.org/10.1145/3409114
Acknowledgement
The authors would like to thank AMD for the provided hardware and software under the Xilinx University Program.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Knödtel, J. et al. (2023). TAPRE-HBM: Trace-Based Processor Rapid Emulation Using HBM on FPGAs. In: Palumbo, F., Keramidas, G., Voros, N., Diniz, P.C. (eds) Applied Reconfigurable Computing. Architectures, Tools, and Applications. ARC 2023. Lecture Notes in Computer Science, vol 14251. Springer, Cham. https://doi.org/10.1007/978-3-031-42921-7_21
Download citation
DOI: https://doi.org/10.1007/978-3-031-42921-7_21
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-42920-0
Online ISBN: 978-3-031-42921-7
eBook Packages: Computer ScienceComputer Science (R0)