Skip to main content

Leveraging HPC accelerator architectures with modern techniques — hydrologic modeling on GPUs with ParFlow

A Correction to this article was published on 12 August 2021

This article has been updated


Rapidly changing heterogeneous supercomputer architectures pose a great challenge to many scientific communities trying to leverage the latest technology in high-performance computing. Many existing projects with a long development history have resulted in a large amount of code that is not directly compatible with the latest accelerator architectures. Furthermore, due to limited resources of scientific institutions, developing and maintaining architecture-specific ports is generally unsustainable. In order to adapt to modern accelerator architectures, many projects rely on directive-based programming models or build the codebase tightly around a third-party domain-specific language or library. This introduces external dependencies out of control of the project. The presented paper tackles the issue by proposing a lightweight application-side adaptor layer for compute kernels and memory management resulting in a versatile and inexpensive adaptation of new accelerator architectures with little draw backs. A widely used hydrologic model demonstrates that such an approach pursued more than 20 years ago is still paying off with modern accelerator architectures as demonstrated by a very significant performance gain from NVIDIA A100 GPUs, high developer productivity, and minimally invasive implementation; all while the codebase is kept well maintainable in the long-term.

Change history


  1. PRACE (2018) The scientific case for computing in Europe 2018–2026. Tech. rep

    Google Scholar 

  2. Lawrence, B.N., Rezny, M., Budich, R., Bauer, P., Behrens, J., Carter, M., Deconinck, W., Ford, R., Maynard, C., Mullerworth, S., Osuna, C., Porter, A., Serradell, K., Valcke, S., Wedi, N., Wilson, S.: Crossing the chasm: how to develop weather and climate models for next generation computers? Geosci. Model Dev. 11(5), 1799–1821 (2018)., URL

    Article  Google Scholar 

  3. MPI Forum (1994) MPI: a message-passing interface standard. Tech. rep., University of Tennessee

  4. Leiserson, C.E., Thompson, N.C., Emer, J.S., Kuszmaul, B.C., Lampson, B.W., Sanchez, D., Schardl, T.B.: There’s plenty of room at the top: what will drive computer performance after Moore’s law? Science. 368(6495), eaam9744 (2020).

    Article  Google Scholar 

  5. Rathgeber, F., Ham, D.A., Mitchell, L., Lange, M., Luporini, F., McRae, A.T., Bercea, G.T., Markall, G.R., Kelly, P.H.: Firedrake: automating the finite element method by composing abstractions. ACM Trans. Math. Softw. 43(3), 1–27 (2016)., URL, 1501.01809

    Article  Google Scholar 

  6. Thaler F, Moosbrugger S, Osuna C, Bianco M, Vogt H, Afanasyev A, Mosimann L, Fuhrer O, Schulthess TC, Hoefler T (2019) Porting the COSMO weather model to manycore CPUs. In: proceedings of the platform for advanced scientific computing conference, PASC 2019, Association for Computing Machinery, Inc, New York, NY, USA, pp 1–11,, URL

  7. Adams, S.V., Ford, R.W., Hambley, M., Hobson, J.M., Kavcic, I., Maynard, C.M., Melvin, T., Mueller, E.H., Mullerworth, S., Porter, A.R., Rezny, M., Shipway, B.J., Wong, R.: LFRic: meeting the challenges of scalability and performance portability in weather and climate models. J. Parallel. Distr. Com. 132, 383–396 (2018)., URL, 1809.07267

    Article  Google Scholar 

  8. Zenker, E., Worpitz, B., Widera, R., Huebl, A., Juckeland, G., Knupfer, A., Nagel, W.E., Bussmann, M.: Alpaka - an abstraction library for parallel kernel acceleration. In: proceedings - 2016 IEEE 30th international parallel and distributed processing symposium, IPDPS 2016, Institute of Electrical and Electronics Engineers Inc., pp 631–640. (2016).

  9. Edwards, H.C., Sunderland, D., Porter, V., Amsler, C., Mish, S.: Manycore performance-portability: Kokkos multidimensional array library. Sci. Program. 20(2), 89–114 (2012).

    Article  Google Scholar 

  10. Beckingsale DA, Scogland TR, Burmark J, Hornung R, Jones H, Killian W, Kunen AJ, Pearce O, Robinson P, Ryujin BS (2019) RAJA: portable performance for large-scale scientific applications. In: Proceedings of P3HPC 2019: International Workshop on Performance, Portability and Productivity in HPC - Held in conjunction with SC 2019: The International Conference for High Performance Computing, Networking, Storage and Analysis, Institute of Electrical and Electronics Engineers Inc., pp 71–81,

  11. Kuffour, B.N.O., Engdahl, N.B., Woodward, C.S., Condon, L.E., Kollet, S., Maxwell, R.M.: Simulating coupled surface-subsurface flows with ParFlow v3.5.0: capabilities, applications, and ongoing development of an open-source, massively parallel, integrated hydrologic model. Geosci. Model Dev. 13(3), 1373–1397 (2020). URL

    Article  Google Scholar 

  12. Woodward CS (1998) A Newton-Krylov-multigrid solver for variably saturated flow problems. Transactions on Ecology and the Environment 17

  13. Kollet, S.J., Maxwell, R.M.: Integrated surface-groundwater flow modeling: a free-surface overland flow boundary condition in a parallel groundwater flow model. Adv. Water Resour. 29(7), 945–958 (2006).

    Article  Google Scholar 

  14. Maxwell, R.M.: A terrain-following grid transform and preconditioner for parallel, large-scale, integrated hydrologic modeling. Adv. Water Resour. 53, 109–117 (2013).

    Article  Google Scholar 

  15. Pleiter D, Herten A (2020) Enabling applications for the JUWELS booster [A21365]. NVIDIA GPU Technology Conference

Download references


The work described in this paper has received funding from the Helmholtz Association (HGF) through the project “Advanced Earth System Modeling Capacity (ESM) and the Pilot Laboratory Exa-ESM. The authors gratefully acknowledge the computing time granted through the ESM test partition on the supercomputer JUWELS at the Jülich Supercomputing Centre, Forschungszentrum Jülich, Germany. The authors also gratefully acknowledge support from the European Commission Horizon 2020 research and innovation program under Grant Agreement No. 824158 (EoCoE-II). Furthermore, NVIDIA Application Lab at the Jülich Supercomputing Centre is thanked for technical support regarding the CUDA implementation. Finally, the foundations for the ParFlow eDSL were laid by Steven Smith, Rob Falgout, and Chuck Baldwin, all from Lawrence Livermore National Laboratory, USA.

Code availability

ParFlow source code is covered by the GNU Lesser General Public License and is available in a public repository at (last access: 27th October 2020). The commit 974c7bb dated 21st October 2020 was used in this paper.


Open Access funding enabled and organized by Projekt DEAL.

Author information

Authors and Affiliations



Jaro Hokkanen performed the technical developments, analyses, and wrote the manuscript; Stefan Kollet advised on ParFlow technical issues, contributed to the analyses, and co-wrote the manuscript; Jiri Kraus, Andreas Herten, and Markus Hrywniak provided technical support regarding the implementation, optimization, and the HPC environment; Dirk Pleiter contributed to the analyses and the manuscript.

Corresponding author

Correspondence to Jaro Hokkanen.

Ethics declarations

Conflict of interest

The authors declare that there is no conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The original online version of this article was revised: The original publication of the article contains major typesetting and production errors introduced by the publisher. During proofreading, corrections provided by authors were not honored by the publisher.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Hokkanen, J., Kollet, S., Kraus, J. et al. Leveraging HPC accelerator architectures with modern techniques — hydrologic modeling on GPUs with ParFlow. Comput Geosci 25, 1579–1590 (2021).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:


  • High-performance computing (HPC)
  • GPU computing
  • Distributed memory parallelism
  • Accelerator architecture
  • Domain-specific language (DSL)

Mathematics Subject Classification (2010)

  • 65Y05
  • 68–04
  • 68N19