Performance Comparison of HPX Versus Traditional Parallelization Strategies for the Discontinuous Galerkin Method

Bremer, Maximilian; Kazhyken, Kazbek; Kaiser, Hartmut; Michoski, Craig; Dawson, Clint

doi:10.1007/s10915-019-00960-z

Performance Comparison of HPX Versus Traditional Parallelization Strategies for the Discontinuous Galerkin Method

Published: 02 May 2019

Volume 80, pages 878–902, (2019)
Cite this article

Journal of Scientific Computing Aims and scope Submit manuscript

Maximilian Bremer ORCID: orcid.org/0000-0002-5940-3432¹,
Kazbek Kazhyken¹,
Hartmut Kaiser²,
Craig Michoski¹ &
…
Clint Dawson¹

515 Accesses
7 Citations
Explore all metrics

Abstract

As high performance computing moves towards the exascale computing regime, applications are required to expose increasingly fine grain parallelism to efficiently use next generation supercomputers. Intended as a solution to the programming challenges associated with these architectures, High Performance ParalleX (HPX) is a task-based C++ runtime, which emphasizes the use of lightweight threads and algorithm-dependent synchronization to maximize parallelism exposed by the application to the machine. The aim of this work is to explore the performance benefits of an HPX parallelization versus a MPI parallelization for the discontinuous Galerkin finite element method for the two-dimensional shallow water equations. We present strong and weak scaling results comparing the performance of HPX versus a MPI parallelization strategy on Knights Landing architectures. Our results indicate that for average task sizes of \(3.6\,{\mathrm {m s}}\), HPX’s runtime overhead is offset by more efficient execution of the application. Furthermore, we demonstrate that running with sufficiently large task granularity, HPX is able to outperform the MPI parallelization by a factor of approximately 1.2 for up to 128 nodes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Multilevel Parallelization: Grid Methods for Solving Direct and Inverse Problems

Performance modeling of hyper-scale custom machine for the principal steps in block Wiedemann algorithm

Article 03 June 2016

Tong Zhou & Jingfei Jiang

A comparison of various schemes for solving the transport equation in many-core platforms

Article 21 October 2016

Marcelo Bondarenco, Pablo Gamazo & Pablo Ezzatti

Notes

http://stellar-group.org/.

References

Amarasinghe, S., Hall, M., Lethin, R., Pingali, K., Quinlan, D., Sarkar, V., Shalf, J., Lucas, R., Yelick, K., Balanji, P., et al.: Exascale programming challenges. In: Proceedings of the Workshop on Exascale Programming Challenges, Marina del Rey, CA, USA. US Department of Energy, Office of Science, Office of Advanced Scientific Computing Research (ASCR) (2011)
Augonnet, C., Thibault, S., Namyst, R., Wacrenier, P.A.: StarPU: a unified platform for task scheduling on heterogeneous multicore architectures. Concurr. Comput. Pract. Exp. 23(2), 187–198 (2011)
Article Google Scholar
Baggag, A., Atkins, H., Keyes, D.: Parallel implementation of the discontinuous Galerkin method. Tech. Rep. ICASE-99-35, Institute for Computer Applications in Science and Engineering (1999)
Balaji, P.: Programming Models for Parallel Computing. MIT Press, Cambridge (2015)
Book MATH Google Scholar
Barat, R.: Load balancing of multi-physics simulation by multi-criteria graph partitioning. Ph.D. thesis, Université de Bordeaux (2017)
Bauer, M., Treichler, S., Slaughter, E., Aiken, A.: Legion: Expressing locality and independence with logical regions. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, p. 66. IEEE Computer Society Press (2012)
Bremer, M.H., Bachan, J.D., Chan, C.P.: Semi-static and dynamic load balancing for asynchronous hurricane storm surge simulations. In: Proceedings of the Parallel Applications Workshop, Alternatives to MPI, p. 13. IEEE (2018)
Browne, S., Dongarra, J., Garner, N., Ho, G., Mucci, P.: A portable programming interface for performance evaluation on modern processors. Int. J. High Perform. Comput. Appl. 14(3), 189–204 (2000)
Article Google Scholar
Brus, S.: Efficiency improvements for modeling coastal hydrodynamics through the application of high-order discontinuous Galerkin solutions to the shallow water equations. Ph.D. thesis, University of Notre Dame (2017)
Brus, S.R., Wirasaet, D., Westerink, J.J., Dawson, C.: Performance and scalability improvements for discontinuous Galerkin solutions to conservation laws on unstructured grids. J. Sci. Comput. 70(1), 210–242 (2017). https://doi.org/10.1007/s10915-016-0249-y
Article MathSciNet MATH Google Scholar
Bunya, S., Dietrich, J., Westerink, J., Ebersole, B., Smith, J., Atkinson, J., Jensen, R., Resio, D., Luettich, R., Dawson, C., Cardone, V., Cox, A., Powell, M., Westerink, H., Roberts, H.: A high resolution coupled riverine flow, tide, wind, wind wave and storm surge model for Southern Louisiana and Mississippi: part I—model development and validation. Mon. Weather Rev. 138, 345–377 (2010)
Article Google Scholar
Bunya, S., Kubatko, E.J., Westerink, J.J., Dawson, C.: A wetting and drying treatment for the Runge–Kutta discontinuous Galerkin solution to the shallow water equations. Comput. Methods Appl. Mech. Eng. 198(17), 1548–1562 (2009)
Article MathSciNet MATH Google Scholar
Chamberlain, B., Callahan, D., Zima, H.: Parallel programmability and the Chapel language. Int. J. High Perform. Comput. Appl. 21(3), 291–312 (2007)
Article Google Scholar
Cockburn, B., Shu, C.W.: TVB Runge–Kutta local projection discontinuous Galerkin finite element method for conservation laws. II. General framework. Math. Comput. 52(186), 411–435 (1989)
MathSciNet MATH Google Scholar
Cockburn, B., Shu, C.W.: Runge–Kutta discontinuous Galerkin methods for convection-dominated problems. J. Sci. Comput. 16(3), 173–261 (2001)
Article MathSciNet MATH Google Scholar
Dawson, C., Kubatko, E.J., Westerink, J.J., Trahan, C., Mirabito, C., Michoski, C., Panda, N.: Discontinuous Galerkin methods for modeling hurricane storm surge. Adv. Water Resour. 34(9), 1165–1176 (2011)
Article Google Scholar
Dietrich, J., Westerink, J., Kennedy, A., Smith, J., Jensen, R.E., Zijlema, M., Holthuijsen, L., Dawson, C., Luettich, R., Powell, M., Cardone, V., Cox, A., Stone, G., Pourtaheri, H., Hope, M., Tanaka, S., Westerink, L., Westerink, H.J., Cobell, Z.: Hurricane Gustav (2008) waves and storm surge: hindcast, synoptic analysis and validation in Southern Louisiana. Mon. Weather Rev. 139, 2488–2522 (2011)
Article Google Scholar
Dietrich, J., Zijlema, M., Westerink, J., Holtjuijsen, L., Dawson, C., Luettich Jr., R.A., Jensen, R., Smith, J., Stelling, G., Stone, G.: Modeling hurricane wave and storm surge using integrally-coupled, scalable computations. Coast. Eng. 58, 45–65 (2011)
Article Google Scholar
Dietrich, J.C., Bunya, S., Westerink, J.J., Ebersole, B.A., Smith, J.M., Atkinson, J.H., Jensen, R., Resio, D.T., Luettich, R.A., Dawson, C., Cardone, V.J., Cox, A.T., Powell, M.D., Westerink, H.J., Roberts, H.J.: A high resolution coupled riverine flow, tide, wind, wind wave and storm surge model for southern Louisiana and Mississippi: part II—synoptic description and analyses of Hurricanes Katrina and Rita. Mon. Weather Rev. 138, 378–404 (2010)
Article Google Scholar
Dubiner, M.: Spectral methods on triangles and other domains. J. Sci. Comput. 6(4), 345–390 (1991)
Article MathSciNet MATH Google Scholar
Dutykh, D., Clamond, D.: Modified shallow water equations for significantly varying seabeds. Appl. Math. Model. 40(23), 9767–9787 (2016). https://doi.org/10.1016/j.apm.2016.06.033
Article MathSciNet Google Scholar
El-Ghazawi, T., Carlson, W., Sterling, T., Yelick, K.: UPC: Distributed Shared Memory Programming, vol. 40. Wiley, London (2005)
Book Google Scholar
Gandham, R., Medina, D., Warburton, T.: GPU accelerated discontinuous Galerkin methods for shallow water equations. Commun. Comput. Phys. 18(1), 3764 (2015). https://doi.org/10.4208/cicp.070114.271114a
Article MathSciNet MATH Google Scholar
Gottlieb, S., Shu, C.W., Tadmor, E.: Strong stability-preserving high-order time discretization methods. SIAM Rev. 43(1), 89–112 (2001)
Article MathSciNet MATH Google Scholar
Grubel, P., Kaiser, H., Cook, J., Serio, A.: The performance implication of task size for applications on the HPX runtime system. In: 2015 IEEE International Conference on Cluster Computing (CLUSTER), pp. 682–689. IEEE (2015)
Grubel, P., Kaiser, H., Huck, K.A., Cook, J.: Using intrinsic performance counters to assess efficiency in task-based parallel applications. In: 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), pp. 1692–1701 (2016)
Grun, P., Hefty, S., Sur, S., Goodell, D., Russell, R.D., Pritchard, H., Squyres, J.M.: A brief introduction to the OpenFabrics interfaces-a new network API for maximizing high performance application efficiency. In: 2015 IEEE 23rd Annual Symposium on High-Performance Interconnects (HOTI), pp. 34–39. IEEE (2015)
Heller, T., Diehl, P., Byerly, Z., Biddiscombe, J., Kaiser, H.: HPX—an open source C++ standard library for parallelism and concurrency. In: Proceedings of OpenSuCo, OpenSuCo’17. ACM (2017)
Heller, T., Kaiser, H., Diehl, P., Fey, D., Schweitzer, M.A.: Closing the Performance Gap with Modern C++. In: Taufer, M., Mohr, B., Kunkel, J.M. (eds.) High Performance Computing, Lecture Notes in Computer Science, vol. 9945, pp. 18–31. Springer International Publishing, Berlin (2016)
Heller, T., Kaiser, H., Schäfer, A., Fey, D.: Using HPX and LibGeoDecomp for scaling HPC applications on heterogeneous supercomputers. In: Proceedings of the Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, p. 1. ACM (2013)
Hesthaven, J.S., Warburton, T.: Nodal Discontinuous Galerkin Methods: Algorithms, Analysis, and Applications. Springer, Berlin (2007)
MATH Google Scholar
Hope, M.E., Westerink, J.J., Kennedy, A.B., Kerr, P.C., Dietrich, J.C., Dawson, C., Bender, C.J., et al.: Hindcast and validation of Hurricane Ike (2008) waves, forerunner, and storm surge. J. Geophys. Res. Oceans 118, 4424–4460 (2013)
Article Google Scholar
Hope, M., Westerink, J., Kennedy, A., Smith, J., Westerink, H., Cox, A., Nong, S., Roberts, K., Resio, D., A.P, T.: Hurricane Sandy (2012) wind, waves and storm surge in New York Bight. I: Model validation. J. Waterw. Port Coast. Ocean Eng. (2016)
Iglberger, K., Hager, G., Treibig, J., Rüde, U.: Expression templates revisited: a performance analysis of current methodologies. SIAM J. Sci. Comput. 34(2), C42–C69 (2012)
Article MathSciNet Google Scholar
Kaiser, H., Adelstein Lelbach, B., Heller, T., Berg, A., Biddiscombe, J., Bikineev, A., et al.: STEllAR-GROUP/hpx: HPX V1.0: The C++ standards library for parallelism and concurrency (Version 1.0.0). Zenodo. https://doi.org/10.5281/zenodo.556772 (2107)
Kale, L.V., Krishnan, S.: CHARM++: A portable concurrent object oriented system based on C++. In: ACM Sigplan Notices, vol. 28, pp. 91–108. ACM (1993)
Karypis, G., Kumar, V.: A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J. Sci. Comput. 20(1), 359–392 (1998)
Article MathSciNet MATH Google Scholar
Kogge, P., Shalf, J.: Exascale computing trends: adjusting to the “new normal” for computer architecture. Comput. Sci. Eng. 15(6), 16–26 (2013)
Article Google Scholar
Kubatko, E., Bunya, S., Dawson, C., Westerink, J.: Dynamic p-adaptive Runge–Kutta discontinuous Galerkin methods for the shallow water equations. Comput. Methods Appl. Mech. Eng. 198, 1766–1774 (2009)
Article MATH Google Scholar
Kubatko, E., Bunya, S., Dawson, C., Westerink, J., Mirabito, C.: A performance comparison of continuous and discontinuous finite element shallow water models. J. Sci. Comput. 40, 315–339 (2009)
Article MathSciNet MATH Google Scholar
Kubatko, E., Westerink, J., Dawson, C.: \(hp\) Discontinuous Galerkin methods for advection dominated problems in shallow water flow. Comput. Methods Appl. Mech. Eng. 196, 437–451 (2006)
Article MATH Google Scholar
Kubatko, E.J., Westerink, J.J., Dawson, C.: Semi discrete discontinuous Galerkin methods and stage-exceeding-order, strong-stability-preserving Runge–Kutta time discretizations. J. Comput. Phys. 222(2), 832–848 (2007)
Article MathSciNet MATH Google Scholar
Luettich, R., et al. J.W.: ADCIRC: a parallel advanced circulation model for oceanic, coastal and estuarine waters (2017). Users manual www.adcirc.org
Michoski, C., Alexanderian, A., Paillet, C., Kubatko, E., Dawson, C.: Stability of nonlinear convection-diffusion-reaction systems in discontinuous Galerkin methods. J. Sci. Comput. 70, 516–550 (2017)
Article MathSciNet MATH Google Scholar
Michoski, C., Dawson, C., Kubatko, E., Wirasaet, D., Brus, S., Westerink, J.: A comparison of artificial viscosity, limiters, and filter, for high order discontinuous Galerkin solution in nonlinear settings. J. Sci. Comput. (2015). https://doi.org/10.1007/s10915.015.0027.2
Michoski, C., Dawson, C., Mirabito, C., Kubatko, E., Wirasaet, D., Westerink, J.: Fully coupled methods for multiphase morphodynamics. Adv. Water Resour. 59, 95–110 (2013)
Article MATH Google Scholar
Michoski, C., Mirabito, C., Dawson, C., Wirasaet, D., Kubatko, E.J., Westerink, J.J.: Adaptive hierarchic transformations for dynamically p-enriched slope-limiting over discontinuous Galerkin systems of generalized equations. J. Comput. Phys. 230(22), 8028–8056 (2011)
Article MathSciNet MATH Google Scholar
Numrich, R.W., Reid, J.: Co-Array Fortran for parallel programming. In: ACM Sigplan Fortran Forum, vol. 17, pp. 1–31. ACM (1998)
OpenMP Architecture Review Board: OpenMP Application Program Interface Version 3.0 (2008). http://www.openmp.org/mp-documents/spec30.pdf
Reed, W.H., Hill, T.: Triangular mesh methods for the neutron transport equation. Tech. rep., Los Alamos Scientific Lab., N. Mex. (USA) (1973)
Tanaka, S., Bunya, S., Westerink, J., Dawson, C., Luettich, R.: Scalability of an unstructured grid continuous Galerkin based hurricane storm surge model. J. Sci. Comput. 46, 329–358 (2011)
Article MathSciNet MATH Google Scholar
Westerink, J.J., Luettich, R.A., Feyen, J.C., Atkinson, J.H., Dawson, C.N., Roberts, H.J., Powell, M.D., Dunion, J.P., Kubatko, E.J., Pourtaheri, H.: A basin to channel scale unstructured grid hurricane storm surge model applied to southern Louisiana. Mon. Weather Rev. 136, 833–864 (2008)
Article Google Scholar
Wirasaet, D., Brus, S., Michoski, C., Kubatko, E., Westerink, J.: Artificial boundary layers in discontinuous Galerkin solutions to shallow water equations in channels. J. Comput. Physics 299, 597–612 (2015)
Article MathSciNet MATH Google Scholar
Wirasaet, D., Kubatko, E., Michoski, C., Tanaka, S., Westerink, J., Dawson, C.: Discontinuous Galerkin methods with nodal and hybrid modal/nodal triangular, quadrilateral, and polygonal elements for nonlinear shallow water flow. Comput. Methods Appl. Mech. Eng. 270, 113–149 (2014)
Article MathSciNet MATH Google Scholar

Download references

Acknowledgements

This work was supported by the National Science Foundation under Grants ACI-1339801 and ACI-1339782 and the U.S. Department of Energy through the Computational Science Graduate Fellowship, Grant DE-FG02-97ER25308. Additionally, the authors would like to acknowledge the Texas Advanced Computing Center (TACC) at the University of Texas at Austin for providing the HPC resources that enabled this research. The presented results were obtained through XSEDE allocation TG-DMS080016N.

Author information

Authors and Affiliations

Oden Institute for Computational Engineering and Sciences, The University of Texas at Austin, Austin, TX, USA
Maximilian Bremer, Kazbek Kazhyken, Craig Michoski & Clint Dawson
Center for Computation and Technology, Louisiana State University, Baton Rouge, LA, USA
Hartmut Kaiser

Authors

Maximilian Bremer
View author publications
You can also search for this author in PubMed Google Scholar
Kazbek Kazhyken
View author publications
You can also search for this author in PubMed Google Scholar
Hartmut Kaiser
View author publications
You can also search for this author in PubMed Google Scholar
Craig Michoski
View author publications
You can also search for this author in PubMed Google Scholar
Clint Dawson
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Maximilian Bremer.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (patch 15 KB)

Artifact Description: Performance Comparison of HPX Versus Traditional Parallelization Strategies for the Discontinuous Galerkin Method

1.1 Abstract

As part of the Association for Computing Machinery’s reproducibility initiative, this artifact outlines the details of generating the computational results in the paper, “Performance Comparison of HPX Versus Traditional Parallelization strategies for the Discontinuous Galerkin Method”.

1.2 Description

1.2.1 Check-List (Artifact Meta Information)

Algorithm: Discontinuous Galerkin Finite Element Method on Unstructured Meshes
Program: C++14
Compilation: GNU C++14 Compiler
Operating System: CentOS Linux release 7.4.1708
Run-time environment: Stampede2
Execution: Parallel/Distributed
Experiment workflow: Compile; Preprocess meshes; Run simulation.
Publicly available?: Yes

1.2.2 How Software Can be Obtained

dgswem-v2 has been released via the MIT license. The code can be obtained from the GitHub repository: https://github.com/UT-CHG/dgswemv2.

1.2.3 Software Dependencies

The building of the software stack involves several libraries. We describe the dependencies here, but provide specific version numbers in Table 4. To build the software stack, we use C++14 conforming version of the GNU C and C++ compiler. We require cmake to build yaml-cpp, HPX, and dgswem-v2. The preprocessor requires METIS to decompose the meshes, and yaml-cpp is required to parse the input files. In addition to cmake, HPX is also compiled with the boost, an MPI implementation, hwloc, and jemalloc libraries.

Table 4 Version numbers or git hashes for all the software dependencies on Stampede2

Full size table

1.3 Installation

To build dgswem-v2, we refer the reader to the README.md page at the root level of the GitHub repository. Additionally, we have made several Stampede2 specific optimizations. Firstly, we have compiled the code on the KNL architecture with the -march=native flag. Secondly, we have made several minor changes to optimize performance, i.e. disabling slope limiting and linking MKL. These changes have been provided as a patch file in the supplementary material.

1.4 Experiment Workflow

The workflow requires generating meshes, partitioning meshes, and executing the simulations. For brevity, we omit the details here. However, they can be found in the dgswem-v2: User’s Guide, located in the documentation/users-guide directory of the repository.

1.5 Evaluation and Expected Result

Each simulation run will return the execution time in microseconds to the standard output.

1.6 Notes

dgswem-v2 is under development at the date of publication, and all new contributions can be found at https://github.com/UT-CHG/dgswemv2. If there are any comments, questions, or suggestions, we encourage communicating with the developers through the GitHub issues page.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bremer, M., Kazhyken, K., Kaiser, H. et al. Performance Comparison of HPX Versus Traditional Parallelization Strategies for the Discontinuous Galerkin Method. J Sci Comput 80, 878–902 (2019). https://doi.org/10.1007/s10915-019-00960-z

Download citation

Received: 02 May 2018
Revised: 23 February 2019
Accepted: 16 April 2019
Published: 02 May 2019
Issue Date: 15 August 2019
DOI: https://doi.org/10.1007/s10915-019-00960-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Performance Comparison of HPX Versus Traditional Parallelization Strategies for the Discontinuous Galerkin Method

Abstract

Access this article

Similar content being viewed by others

Multilevel Parallelization: Grid Methods for Solving Direct and Inverse Problems

Performance modeling of hyper-scale custom machine for the principal steps in block Wiedemann algorithm

A comparison of various schemes for solving the transport equation in many-core platforms

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Electronic supplementary material

Supplementary material 1 (patch 15 KB)

Artifact Description: Performance Comparison of HPX Versus Traditional Parallelization Strategies for the Discontinuous Galerkin Method

1.1 Abstract

1.2 Description

1.2.1 Check-List (Artifact Meta Information)

1.2.2 How Software Can be Obtained

1.2.3 Software Dependencies

1.3 Installation

1.4 Experiment Workflow

1.5 Evaluation and Expected Result

1.6 Notes

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Performance Comparison of HPX Versus Traditional Parallelization Strategies for the Discontinuous Galerkin Method

Abstract

Access this article

Similar content being viewed by others

Multilevel Parallelization: Grid Methods for Solving Direct and Inverse Problems

Performance modeling of hyper-scale custom machine for the principal steps in block Wiedemann algorithm

A comparison of various schemes for solving the transport equation in many-core platforms

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Electronic supplementary material

Supplementary material 1 (patch 15 KB)

Artifact Description: Performance Comparison of HPX Versus Traditional Parallelization Strategies for the Discontinuous Galerkin Method

Artifact Description: Performance Comparison of HPX Versus Traditional Parallelization Strategies for the Discontinuous Galerkin Method

1.1 Abstract

1.2 Description

1.2.1 Check-List (Artifact Meta Information)

1.2.2 How Software Can be Obtained

1.2.3 Software Dependencies

1.3 Installation

1.4 Experiment Workflow

1.5 Evaluation and Expected Result

1.6 Notes

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation