Skip to main content
Log in

Performance Comparison of HPX Versus Traditional Parallelization Strategies for the Discontinuous Galerkin Method

  • Published:
Journal of Scientific Computing Aims and scope Submit manuscript

Abstract

As high performance computing moves towards the exascale computing regime, applications are required to expose increasingly fine grain parallelism to efficiently use next generation supercomputers. Intended as a solution to the programming challenges associated with these architectures, High Performance ParalleX (HPX) is a task-based C++ runtime, which emphasizes the use of lightweight threads and algorithm-dependent synchronization to maximize parallelism exposed by the application to the machine. The aim of this work is to explore the performance benefits of an HPX parallelization versus a MPI parallelization for the discontinuous Galerkin finite element method for the two-dimensional shallow water equations. We present strong and weak scaling results comparing the performance of HPX versus a MPI parallelization strategy on Knights Landing architectures. Our results indicate that for average task sizes of \(3.6\,{\mathrm {m s}}\), HPX’s runtime overhead is offset by more efficient execution of the application. Furthermore, we demonstrate that running with sufficiently large task granularity, HPX is able to outperform the MPI parallelization by a factor of approximately 1.2 for up to 128 nodes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Notes

  1. http://stellar-group.org/.

References

  1. Amarasinghe, S., Hall, M., Lethin, R., Pingali, K., Quinlan, D., Sarkar, V., Shalf, J., Lucas, R., Yelick, K., Balanji, P., et al.: Exascale programming challenges. In: Proceedings of the Workshop on Exascale Programming Challenges, Marina del Rey, CA, USA. US Department of Energy, Office of Science, Office of Advanced Scientific Computing Research (ASCR) (2011)

  2. Augonnet, C., Thibault, S., Namyst, R., Wacrenier, P.A.: StarPU: a unified platform for task scheduling on heterogeneous multicore architectures. Concurr. Comput. Pract. Exp. 23(2), 187–198 (2011)

    Article  Google Scholar 

  3. Baggag, A., Atkins, H., Keyes, D.: Parallel implementation of the discontinuous Galerkin method. Tech. Rep. ICASE-99-35, Institute for Computer Applications in Science and Engineering (1999)

  4. Balaji, P.: Programming Models for Parallel Computing. MIT Press, Cambridge (2015)

    Book  MATH  Google Scholar 

  5. Barat, R.: Load balancing of multi-physics simulation by multi-criteria graph partitioning. Ph.D. thesis, Université de Bordeaux (2017)

  6. Bauer, M., Treichler, S., Slaughter, E., Aiken, A.: Legion: Expressing locality and independence with logical regions. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, p. 66. IEEE Computer Society Press (2012)

  7. Bremer, M.H., Bachan, J.D., Chan, C.P.: Semi-static and dynamic load balancing for asynchronous hurricane storm surge simulations. In: Proceedings of the Parallel Applications Workshop, Alternatives to MPI, p. 13. IEEE (2018)

  8. Browne, S., Dongarra, J., Garner, N., Ho, G., Mucci, P.: A portable programming interface for performance evaluation on modern processors. Int. J. High Perform. Comput. Appl. 14(3), 189–204 (2000)

    Article  Google Scholar 

  9. Brus, S.: Efficiency improvements for modeling coastal hydrodynamics through the application of high-order discontinuous Galerkin solutions to the shallow water equations. Ph.D. thesis, University of Notre Dame (2017)

  10. Brus, S.R., Wirasaet, D., Westerink, J.J., Dawson, C.: Performance and scalability improvements for discontinuous Galerkin solutions to conservation laws on unstructured grids. J. Sci. Comput. 70(1), 210–242 (2017). https://doi.org/10.1007/s10915-016-0249-y

    Article  MathSciNet  MATH  Google Scholar 

  11. Bunya, S., Dietrich, J., Westerink, J., Ebersole, B., Smith, J., Atkinson, J., Jensen, R., Resio, D., Luettich, R., Dawson, C., Cardone, V., Cox, A., Powell, M., Westerink, H., Roberts, H.: A high resolution coupled riverine flow, tide, wind, wind wave and storm surge model for Southern Louisiana and Mississippi: part I—model development and validation. Mon. Weather Rev. 138, 345–377 (2010)

    Article  Google Scholar 

  12. Bunya, S., Kubatko, E.J., Westerink, J.J., Dawson, C.: A wetting and drying treatment for the Runge–Kutta discontinuous Galerkin solution to the shallow water equations. Comput. Methods Appl. Mech. Eng. 198(17), 1548–1562 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  13. Chamberlain, B., Callahan, D., Zima, H.: Parallel programmability and the Chapel language. Int. J. High Perform. Comput. Appl. 21(3), 291–312 (2007)

    Article  Google Scholar 

  14. Cockburn, B., Shu, C.W.: TVB Runge–Kutta local projection discontinuous Galerkin finite element method for conservation laws. II. General framework. Math. Comput. 52(186), 411–435 (1989)

    MathSciNet  MATH  Google Scholar 

  15. Cockburn, B., Shu, C.W.: Runge–Kutta discontinuous Galerkin methods for convection-dominated problems. J. Sci. Comput. 16(3), 173–261 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  16. Dawson, C., Kubatko, E.J., Westerink, J.J., Trahan, C., Mirabito, C., Michoski, C., Panda, N.: Discontinuous Galerkin methods for modeling hurricane storm surge. Adv. Water Resour. 34(9), 1165–1176 (2011)

    Article  Google Scholar 

  17. Dietrich, J., Westerink, J., Kennedy, A., Smith, J., Jensen, R.E., Zijlema, M., Holthuijsen, L., Dawson, C., Luettich, R., Powell, M., Cardone, V., Cox, A., Stone, G., Pourtaheri, H., Hope, M., Tanaka, S., Westerink, L., Westerink, H.J., Cobell, Z.: Hurricane Gustav (2008) waves and storm surge: hindcast, synoptic analysis and validation in Southern Louisiana. Mon. Weather Rev. 139, 2488–2522 (2011)

    Article  Google Scholar 

  18. Dietrich, J., Zijlema, M., Westerink, J., Holtjuijsen, L., Dawson, C., Luettich Jr., R.A., Jensen, R., Smith, J., Stelling, G., Stone, G.: Modeling hurricane wave and storm surge using integrally-coupled, scalable computations. Coast. Eng. 58, 45–65 (2011)

    Article  Google Scholar 

  19. Dietrich, J.C., Bunya, S., Westerink, J.J., Ebersole, B.A., Smith, J.M., Atkinson, J.H., Jensen, R., Resio, D.T., Luettich, R.A., Dawson, C., Cardone, V.J., Cox, A.T., Powell, M.D., Westerink, H.J., Roberts, H.J.: A high resolution coupled riverine flow, tide, wind, wind wave and storm surge model for southern Louisiana and Mississippi: part II—synoptic description and analyses of Hurricanes Katrina and Rita. Mon. Weather Rev. 138, 378–404 (2010)

    Article  Google Scholar 

  20. Dubiner, M.: Spectral methods on triangles and other domains. J. Sci. Comput. 6(4), 345–390 (1991)

    Article  MathSciNet  MATH  Google Scholar 

  21. Dutykh, D., Clamond, D.: Modified shallow water equations for significantly varying seabeds. Appl. Math. Model. 40(23), 9767–9787 (2016). https://doi.org/10.1016/j.apm.2016.06.033

    Article  MathSciNet  Google Scholar 

  22. El-Ghazawi, T., Carlson, W., Sterling, T., Yelick, K.: UPC: Distributed Shared Memory Programming, vol. 40. Wiley, London (2005)

    Book  Google Scholar 

  23. Gandham, R., Medina, D., Warburton, T.: GPU accelerated discontinuous Galerkin methods for shallow water equations. Commun. Comput. Phys. 18(1), 3764 (2015). https://doi.org/10.4208/cicp.070114.271114a

    Article  MathSciNet  MATH  Google Scholar 

  24. Gottlieb, S., Shu, C.W., Tadmor, E.: Strong stability-preserving high-order time discretization methods. SIAM Rev. 43(1), 89–112 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  25. Grubel, P., Kaiser, H., Cook, J., Serio, A.: The performance implication of task size for applications on the HPX runtime system. In: 2015 IEEE International Conference on Cluster Computing (CLUSTER), pp. 682–689. IEEE (2015)

  26. Grubel, P., Kaiser, H., Huck, K.A., Cook, J.: Using intrinsic performance counters to assess efficiency in task-based parallel applications. In: 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), pp. 1692–1701 (2016)

  27. Grun, P., Hefty, S., Sur, S., Goodell, D., Russell, R.D., Pritchard, H., Squyres, J.M.: A brief introduction to the OpenFabrics interfaces-a new network API for maximizing high performance application efficiency. In: 2015 IEEE 23rd Annual Symposium on High-Performance Interconnects (HOTI), pp. 34–39. IEEE (2015)

  28. Heller, T., Diehl, P., Byerly, Z., Biddiscombe, J., Kaiser, H.: HPX—an open source C++ standard library for parallelism and concurrency. In: Proceedings of OpenSuCo, OpenSuCo’17. ACM (2017)

  29. Heller, T., Kaiser, H., Diehl, P., Fey, D., Schweitzer, M.A.: Closing the Performance Gap with Modern C++. In: Taufer, M., Mohr, B., Kunkel, J.M. (eds.) High Performance Computing, Lecture Notes in Computer Science, vol. 9945, pp. 18–31. Springer International Publishing, Berlin (2016)

  30. Heller, T., Kaiser, H., Schäfer, A., Fey, D.: Using HPX and LibGeoDecomp for scaling HPC applications on heterogeneous supercomputers. In: Proceedings of the Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, p. 1. ACM (2013)

  31. Hesthaven, J.S., Warburton, T.: Nodal Discontinuous Galerkin Methods: Algorithms, Analysis, and Applications. Springer, Berlin (2007)

    MATH  Google Scholar 

  32. Hope, M.E., Westerink, J.J., Kennedy, A.B., Kerr, P.C., Dietrich, J.C., Dawson, C., Bender, C.J., et al.: Hindcast and validation of Hurricane Ike (2008) waves, forerunner, and storm surge. J. Geophys. Res. Oceans 118, 4424–4460 (2013)

    Article  Google Scholar 

  33. Hope, M., Westerink, J., Kennedy, A., Smith, J., Westerink, H., Cox, A., Nong, S., Roberts, K., Resio, D., A.P, T.: Hurricane Sandy (2012) wind, waves and storm surge in New York Bight. I: Model validation. J. Waterw. Port Coast. Ocean Eng. (2016)

  34. Iglberger, K., Hager, G., Treibig, J., Rüde, U.: Expression templates revisited: a performance analysis of current methodologies. SIAM J. Sci. Comput. 34(2), C42–C69 (2012)

    Article  MathSciNet  Google Scholar 

  35. Kaiser, H., Adelstein Lelbach, B., Heller, T., Berg, A., Biddiscombe, J., Bikineev, A., et al.: STEllAR-GROUP/hpx: HPX V1.0: The C++ standards library for parallelism and concurrency (Version 1.0.0). Zenodo. https://doi.org/10.5281/zenodo.556772 (2107)

  36. Kale, L.V., Krishnan, S.: CHARM++: A portable concurrent object oriented system based on C++. In: ACM Sigplan Notices, vol. 28, pp. 91–108. ACM (1993)

  37. Karypis, G., Kumar, V.: A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J. Sci. Comput. 20(1), 359–392 (1998)

    Article  MathSciNet  MATH  Google Scholar 

  38. Kogge, P., Shalf, J.: Exascale computing trends: adjusting to the “new normal” for computer architecture. Comput. Sci. Eng. 15(6), 16–26 (2013)

    Article  Google Scholar 

  39. Kubatko, E., Bunya, S., Dawson, C., Westerink, J.: Dynamic p-adaptive Runge–Kutta discontinuous Galerkin methods for the shallow water equations. Comput. Methods Appl. Mech. Eng. 198, 1766–1774 (2009)

    Article  MATH  Google Scholar 

  40. Kubatko, E., Bunya, S., Dawson, C., Westerink, J., Mirabito, C.: A performance comparison of continuous and discontinuous finite element shallow water models. J. Sci. Comput. 40, 315–339 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  41. Kubatko, E., Westerink, J., Dawson, C.: \(hp\) Discontinuous Galerkin methods for advection dominated problems in shallow water flow. Comput. Methods Appl. Mech. Eng. 196, 437–451 (2006)

    Article  MATH  Google Scholar 

  42. Kubatko, E.J., Westerink, J.J., Dawson, C.: Semi discrete discontinuous Galerkin methods and stage-exceeding-order, strong-stability-preserving Runge–Kutta time discretizations. J. Comput. Phys. 222(2), 832–848 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  43. Luettich, R., et al. J.W.: ADCIRC: a parallel advanced circulation model for oceanic, coastal and estuarine waters (2017). Users manual www.adcirc.org

  44. Michoski, C., Alexanderian, A., Paillet, C., Kubatko, E., Dawson, C.: Stability of nonlinear convection-diffusion-reaction systems in discontinuous Galerkin methods. J. Sci. Comput. 70, 516–550 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  45. Michoski, C., Dawson, C., Kubatko, E., Wirasaet, D., Brus, S., Westerink, J.: A comparison of artificial viscosity, limiters, and filter, for high order discontinuous Galerkin solution in nonlinear settings. J. Sci. Comput. (2015). https://doi.org/10.1007/s10915.015.0027.2

  46. Michoski, C., Dawson, C., Mirabito, C., Kubatko, E., Wirasaet, D., Westerink, J.: Fully coupled methods for multiphase morphodynamics. Adv. Water Resour. 59, 95–110 (2013)

    Article  MATH  Google Scholar 

  47. Michoski, C., Mirabito, C., Dawson, C., Wirasaet, D., Kubatko, E.J., Westerink, J.J.: Adaptive hierarchic transformations for dynamically p-enriched slope-limiting over discontinuous Galerkin systems of generalized equations. J. Comput. Phys. 230(22), 8028–8056 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  48. Numrich, R.W., Reid, J.: Co-Array Fortran for parallel programming. In: ACM Sigplan Fortran Forum, vol. 17, pp. 1–31. ACM (1998)

  49. OpenMP Architecture Review Board: OpenMP Application Program Interface Version 3.0 (2008). http://www.openmp.org/mp-documents/spec30.pdf

  50. Reed, W.H., Hill, T.: Triangular mesh methods for the neutron transport equation. Tech. rep., Los Alamos Scientific Lab., N. Mex. (USA) (1973)

  51. Tanaka, S., Bunya, S., Westerink, J., Dawson, C., Luettich, R.: Scalability of an unstructured grid continuous Galerkin based hurricane storm surge model. J. Sci. Comput. 46, 329–358 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  52. Westerink, J.J., Luettich, R.A., Feyen, J.C., Atkinson, J.H., Dawson, C.N., Roberts, H.J., Powell, M.D., Dunion, J.P., Kubatko, E.J., Pourtaheri, H.: A basin to channel scale unstructured grid hurricane storm surge model applied to southern Louisiana. Mon. Weather Rev. 136, 833–864 (2008)

    Article  Google Scholar 

  53. Wirasaet, D., Brus, S., Michoski, C., Kubatko, E., Westerink, J.: Artificial boundary layers in discontinuous Galerkin solutions to shallow water equations in channels. J. Comput. Physics 299, 597–612 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  54. Wirasaet, D., Kubatko, E., Michoski, C., Tanaka, S., Westerink, J., Dawson, C.: Discontinuous Galerkin methods with nodal and hybrid modal/nodal triangular, quadrilateral, and polygonal elements for nonlinear shallow water flow. Comput. Methods Appl. Mech. Eng. 270, 113–149 (2014)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

This work was supported by the National Science Foundation under Grants ACI-1339801 and ACI-1339782 and the U.S. Department of Energy through the Computational Science Graduate Fellowship, Grant DE-FG02-97ER25308. Additionally, the authors would like to acknowledge the Texas Advanced Computing Center (TACC) at the University of Texas at Austin for providing the HPC resources that enabled this research. The presented results were obtained through XSEDE allocation TG-DMS080016N.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Maximilian Bremer.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (patch 15 KB)

Artifact Description: Performance Comparison of HPX Versus Traditional Parallelization Strategies for the Discontinuous Galerkin Method

Artifact Description: Performance Comparison of HPX Versus Traditional Parallelization Strategies for the Discontinuous Galerkin Method

1.1 Abstract

As part of the Association for Computing Machinery’s reproducibility initiative, this artifact outlines the details of generating the computational results in the paper, “Performance Comparison of HPX Versus Traditional Parallelization strategies for the Discontinuous Galerkin Method”.

1.2 Description

1.2.1 Check-List (Artifact Meta Information)

  • Algorithm: Discontinuous Galerkin Finite Element Method on Unstructured Meshes

  • Program: C++14

  • Compilation: GNU C++14 Compiler

  • Operating System: CentOS Linux release 7.4.1708

  • Run-time environment: Stampede2

  • Execution: Parallel/Distributed

  • Experiment workflow: Compile; Preprocess meshes; Run simulation.

  • Publicly available?: Yes

1.2.2 How Software Can be Obtained

dgswem-v2 has been released via the MIT license. The code can be obtained from the GitHub repository: https://github.com/UT-CHG/dgswemv2.

1.2.3 Software Dependencies

The building of the software stack involves several libraries. We describe the dependencies here, but provide specific version numbers in Table 4. To build the software stack, we use C++14 conforming version of the GNU C and C++ compiler. We require cmake to build yaml-cpp, HPX, and dgswem-v2. The preprocessor requires METIS to decompose the meshes, and yaml-cpp is required to parse the input files. In addition to cmake, HPX is also compiled with the boost, an MPI implementation, hwloc, and jemalloc libraries.

Table 4 Version numbers or git hashes for all the software dependencies on Stampede2

1.3 Installation

To build dgswem-v2, we refer the reader to the README.md page at the root level of the GitHub repository. Additionally, we have made several Stampede2 specific optimizations. Firstly, we have compiled the code on the KNL architecture with the -march=native flag. Secondly, we have made several minor changes to optimize performance, i.e. disabling slope limiting and linking MKL. These changes have been provided as a patch file in the supplementary material.

1.4 Experiment Workflow

The workflow requires generating meshes, partitioning meshes, and executing the simulations. For brevity, we omit the details here. However, they can be found in the dgswem-v2: User’s Guide, located in the documentation/users-guide directory of the repository.

1.5 Evaluation and Expected Result

Each simulation run will return the execution time in microseconds to the standard output.

1.6 Notes

dgswem-v2 is under development at the date of publication, and all new contributions can be found at https://github.com/UT-CHG/dgswemv2. If there are any comments, questions, or suggestions, we encourage communicating with the developers through the GitHub issues page.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bremer, M., Kazhyken, K., Kaiser, H. et al. Performance Comparison of HPX Versus Traditional Parallelization Strategies for the Discontinuous Galerkin Method. J Sci Comput 80, 878–902 (2019). https://doi.org/10.1007/s10915-019-00960-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10915-019-00960-z

Keywords

Navigation