Abstract
FPGAs are receiving increased attention as a promising architecture for accelerators in HPC systems. Evolving and maturing development tools based on high-level synthesis promise productivity improvements for this technology. However, up to now, FPGA designs for complex simulation workloads, like shallow water simulations based on discontinuous Galerkin discretizations, rely to a large degree on manual application-specific optimizations. In this work, we present a new approach to port shallow water simulations to FPGAs, based on a code-generation framework for high-level abstractions in combination with a template-based stencil processing library that provides FPGA-specific optimizations for a streaming execution model. The new implementation uses a structured grid representation suitable for stencil computations and is compared to an adaptation from an existing hand-optimized FPGA dataflow design supporting unstructured meshes. While there are many differences, for example in the numerical details and problem scalability to be discussed, we demonstrate that overall both approaches can yield meaningful results at competitive performance for the same target FPGA, thus demonstrating a new level of maturity for FPGA-accelerated scientific simulations.
Keywords
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Aizinger, V., Dawson, C.: A discontinuous Galerkin method for two-dimensional flow and transport in shallow water. Adv. Water Resour. 25(1), 67–84 (2002). https://doi.org/10.1016/S0309-1708(01)00019-7
Bauer, M., et al.: Code generation for massively parallel phase-field simulations. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC 2019), pp. 1–32. Association for Computing Machinery, New York, NY, USA (2019). https://doi.org/10.1145/3295500.3356186
Chi, Y., Cong, J.: Exploiting computation reuse for stencil accelerators. In: 2020 57th ACM/IEEE Design Automation Conference (DAC), pp. 1–6. IEEE, San Francisco, CA, USA (2020). https://doi.org/10.1109/DAC18072.2020.9218680
Faghih-Naini, S., Aizinger, V.: p-adaptive discontinuous Galerkin method for the shallow water equations with a parameter-free error indicator. Int. J. Geomath. 13(1), 18 (2022). https://doi.org/10.1007/s13137-022-00208-3
Faghih-Naini, S., Kuckuk, S., Aizinger, V., Zint, D., et al.: Quadrature-free discontinuous Galerkin method with code generation features for shallow water equations on automatically generated block-structured meshes. Adv. Water Resour. 138, 103552 (2020). https://doi.org/10.1016/j.advwatres.2020.103552
Faj, J., Plessl, C., Kenter, T., Faghih-Naini, S., Aizinger, V.: Scalable multi-FPGA design of a discontinuous Galerkin shallow-water model on unstructured meshes. In: Proceedings of the Platform for Advanced Scientific Computing Conference (PASC) (2023, to appear)
de Fine Licht, J., Kuster, A., De Matteis, T., Ben-Nun, T., et al.: Stencilflow: mapping large stencil programs to distributed spatial computing systems. In: 2021 IEEE/ACM International Symposium on Code Generation and Optimization (CGO), pp. 315–326. IEEE (2021). https://doi.org/10.1109/CGO51591.2021.9370315
Gruber, T., Eitzinger, J., Hager, G., Wellein, G.: LIKWID. Zenodo (2022). https://doi.org/10.5281/ZENODO.7432487
Hajduk, H., Kuzmin, D., Aizinger, V.: New directional vector limiters for discontinuous Galerkin methods. J. Comput. Phys. 384, 308–325 (2019). https://doi.org/10.1016/j.jcp.2019.01.032
Kenter, T.: Invited tutorial: OpenCL design flows for Intel and Xilinx FPGAs: using common design patterns and dealing with vendor-specific differences. In: Proc. Int. Workshop on FPGAs for Software Programmers (FSP), collocated with Int. Conf. on Field Programmable Logic and Applications (FPL) (2019)
Kenter, T., Förstner, J., Plessl, C.: Flexible FPGA design for FDTD using OpenCL. In: Proc. Int. Conf. on Field Programmable Logic and Applications (FPL), pp. 1–7. IEEE (2017). https://doi.org/10.23919/FPL.2017.8056844
Kenter, T., et al.: OpenCL-based FPGA design to accelerate the nodal discontinuous Galerkin method for unstructured meshes. In: Proc. IEEE Symp. on Field-Programmable Custom Computing Machines (FCCM), pp. 189–196. IEEE (2018). https://doi.org/10.1109/FCCM.2018.00037
Kenter, T., Shambhu, A., Faghih-Naini, S., Aizinger, V.: Algorithm-hardware co-design of a discontinuous Galerkin shallow-water model for a dataflow architecture on FPGA. In: Proceedings of the Platform for Advanced Scientific Computing Conference, pp. 1–11. ACM, Geneva, Switzerland (2021). https://doi.org/10.1145/3468267.3470617
Kono, F., Nakasato, N., Hayashi, K., Vazhenin, A., Sedukhin, S.: Evaluations of OpenCL-written tsunami simulation on FPGA and comparison with GPU implementation. J. Supercomput. 74(6), 2747–2775 (2018). https://doi.org/10.1007/s11227-018-2315-8
Lavrentiev, M., Lysakov, K., Marchuk, A., Oblaukhov, K., et al.: Algorithmic design of an FPGA-based calculator for fast evaluation of tsunami wave danger. Algorithms 14(12), 343 (2021). https://doi.org/10.3390/a14120343
Lengauer, C., et al.: ExaStencils: advanced multigrid solver generation. In: Bungartz, H.-J., Reiz, S., Uekermann, B., Neumann, P., Nagel, W.E. (eds.) Software for Exascale Computing - SPPEXA 2016-2019. LNCSE, vol. 136, pp. 405–452. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-47956-5_14
Meurer, A., Smith, C.P., Paprocki, M., Čertík, O., et al.: SymPy: symbolic computing in python. PeerJ Comput. Sci. 3, e103 (2017). https://doi.org/10.7717/peerj-cs.103
Nagasu, K., Sano, K., Kono, F., Nakasato, N.: FPGA-based tsunami simulation: Performance comparison with GPUs, and roofline model for scalability analysis. J. Parallel Distrib. Comput. 106, 153–169 (2017). https://doi.org/10.1016/j.jpdc.2016.12.015
Silva, B., Braeken, A., Touhafi, A., D’Hollander, E.: Performance modeling for FPGAs: extending the roofline model with high-level synthesis tools. Int. J. Reconfigurable Comput. 2013, 7 (2013). https://doi.org/10.1155/2013/428078
Siracusa, M., Del Sozzo, E., Rabozzi, M., Di Tucci, L., et al.: A comprehensive methodology to optimize FPGA designs via the roofline model. IEEE Trans. Comput. 71(8), 1903–1915 (2022). https://doi.org/10.1109/TC.2021.3111761
Trimberger, S.M.S.: Three ages of FPGAs: a retrospective on the first thirty years of FPGA technology: this paper reflects on how Moore’s law has driven the design of FPGAs through three epochs: the age of invention, the age of expansion, and the age of accumulation. IEEE Solid-State Circuits Mag. 10(2), 16–29 (2018). https://doi.org/10.1109/MSSC.2018.2822862
Williams, S., Waterman, A., Patterson, D.: Roofline: an insightful visual performance model for multicore architectures. Commun. ACM 52(4), 65–76 (2009). https://doi.org/10.1145/1498765.1498785
Zint, D., Grosso, R., Aizinger, V., Faghih-Naini, S., et al.: Automatic generation of load-balancing-aware block-structured grids for complex ocean domains. In: 30th International Meshing Roundtable (SIAM IMR 2022). Zenodo (2022). https://doi.org/10.5281/zenodo.6562440
Zint, D., Grosso, R., Aizinger, V., Köstler, H.: Generation of block structured grids on complex domains for high performance simulation. Comput. Math. Math. Phys. 59(12), 2108–2123 (2019). https://doi.org/10.1134/S0965542519120182
Zohouri, H.R., Podobas, A., Matsuoka, S.: Combined spatial and temporal blocking for high-performance stencil computation on FPGAs using OpenCL. In: Proc. Int. Symp. on Field-Programmable Gate Arrays (FPGA 2018), pp. 153–162. ACM, New York, NY, USA (2018). https://doi.org/10.1145/3174243.3174248
Acknowledgments
The authors gratefully acknowledge the funding of this project by computing time provided by the Paderborn Center for Parallel Computing (PC2). The authors gratefully acknowledge the scientific support and HPC resources provided by the Erlangen National High Performance Computing Center (NHR@FAU) of the Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU). The hardware is funded by the German Research Foundation (DFG). The work in this paper was supported in part by the DFG through grant AI 117/6-1 ‘Performance optimized software strategies for unstructured-mesh applications in ocean modeling’.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Alt, C. et al. (2023). Shallow Water DG Simulations on FPGAs: Design and Comparison of a Novel Code Generation Pipeline. In: Bhatele, A., Hammond, J., Baboulin, M., Kruse, C. (eds) High Performance Computing. ISC High Performance 2023. Lecture Notes in Computer Science, vol 13948. Springer, Cham. https://doi.org/10.1007/978-3-031-32041-5_5
Download citation
DOI: https://doi.org/10.1007/978-3-031-32041-5_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-32040-8
Online ISBN: 978-3-031-32041-5
eBook Packages: Computer ScienceComputer Science (R0)
