Skip to main content
Log in

Performance and Scalability Improvements for Discontinuous Galerkin Solutions to Conservation Laws on Unstructured Grids

  • Published:
Journal of Scientific Computing Aims and scope Submit manuscript

Abstract

This paper presents a computational framework developed to improve both the serial and parallel performance of two dimensional, unstructured, discontinuous Galerkin (DG) solutions to hyperbolic conservation laws. The coding techniques employed factor in advancements trending in HPC technologies. They are designed to maximize loop vectorization, efficiently utilize cache, facilitate straightforward shared memory parallelization, reduce message passing volume, and increase the overlap between computation and communication. With today’s CPU technology and HPC networks rapidly evolving, it is important to quantitatively assess and compare these methodologies with standard paradigms in order to maximize current computational resources. In our benchmark studies, we specifically investigate the shallow water equations to show that the refactored algorithm implementation is able to provide a significant performance increase over the conventional elemental DG code structure in terms of both CPU time and parallel scalability. Our results show that the serial optimizations result in a 28–38 % performance increase. For parallel computations our improvements give rise to a 1.5–2.0 speedup factor for local problem sizes between 10 and 2000 elements per core, regardless of the overall problem size. The computational benchmarks were performed on the Lonestar and Stampede supercomputers at the Texas Advanced Computing Center.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22
Fig. 23
Fig. 24
Fig. 25

Similar content being viewed by others

References

  1. Baggag, A., Atkins, H., Keyes, D.: Parallel implementation of the discontinuous Galerkin method. In: Parallel Computational Fluid Dynamics, pp. 115–122 (1999)

  2. Bell, J.B., Dawson, C.N., Shubin, G.R.: An unsplit, higher order Godunov method for scalar conservation laws in multiple dimensions. J. Comput. Phys. 74(1), 1–24 (1988)

    Article  MATH  Google Scholar 

  3. Biswas, R., Devine, K.D., Flaherty, J.E.: Parallel, adaptive finite element methods for conservation laws. Appl. Numer. Math. 14(1), 255–283 (1994)

    Article  MathSciNet  MATH  Google Scholar 

  4. Cockburn, B., Shu, C.: The Runge-Kutta discontinuous Galerkin method for conservation laws V. Multidimnesional systems. J. Comput. Phys. 141, 199–224 (1998)

    Article  MathSciNet  MATH  Google Scholar 

  5. Cockburn, B., Shu, C.W.: TVB Runge-Kutta local projection discontinuous Galerkin finite element method for conservation laws. II. General framework. Math. Comput. 52(186), 411–435 (1989)

    MathSciNet  MATH  Google Scholar 

  6. Dubiner, M.: Spectral methods on triangle and other domains. J. Sci. Comput. 6, 345–390 (1991)

    Article  MathSciNet  MATH  Google Scholar 

  7. Franchetti, F., Kral, S., Lorenz, J., Ueberhuber, C.: Efficient utilization of SIMD extensions. Proc. IEEE 93(2), 409–425 (2005)

    Article  Google Scholar 

  8. Gwennap, L.: Sandy bridge spans generations. Microprocess. Rep. 9(27), 10-01 (2010)

    Google Scholar 

  9. Hager, G., Wellein, G.: Introduction to High Performance Computing for Scientists and Engineers. CRC Press, Boca Raton (2010)

    Book  Google Scholar 

  10. Hassaballah, M., Omran, S., Mahdy, Y.B.: A review of SIMD multimedia extensions and their usage in scientific and engineering applications. Comput. J. 51(6), 630–649 (2008)

    Article  Google Scholar 

  11. Hu, C., Shu, C.W.: Weighted essentially non-oscillatory schemes on triangular meshes. J. Comput. Phys. 150(1), 97–127 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  12. Intel Corporation: Intel 64 and IA-32 Architectures Software Developer’s Manual, vol. 1 (2015). http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-vol-1-manual.pdf

  13. Karypis, G., Kumar, V.: Multilevel k-way partitioning scheme for irregular graphs. J. Parallel Distrib. Comput. 48(1), 96–129 (1998)

    Article  MathSciNet  MATH  Google Scholar 

  14. Kelly, J.F., Giraldo, F.X.: Continuous and discontinuous Galerkin methods for a scalable three-dimensional nonhydrostatic atmospheric model: Limited-area mode. J. Comput. Phys. 231(24), 7988–8008 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  15. Klöckner, A., Warburton, T., Bridge, J., Hesthaven, J.: Nodal discontinuous Galerkin methods on graphics processors. J. Comput. Phys. 228(21), 7863–7882 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  16. Koornwinder, T.: Two-variable analogues of the classical orthogonal polynomials. In: Theory and Applications of Special Functions, pp. 435–495 (1975)

  17. Kubatko, E.J., Bunya, S., Dawson, C., Westerink, J.J., Mirabito, C.: A performance comparison of continuous and discontinuous finite element shallow water models. J. Sci. Comput. 40, 315–339 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  18. Kubatko, E.J., Westerink, J.J., Dawson, C.: hp discontinuous Galerkin methods for advection dominated problems in shallow water flow. Comput. Methods Appl. Mech. Eng. 196, 437–451 (2006)

    Article  MATH  Google Scholar 

  19. LeVeque, R.J.: Numerical Methods for Conservation Laws, vol. 132. Birkhäuser, Basel (1992)

    Book  MATH  Google Scholar 

  20. Mudge, T.: Power: a first-class architectural design constraint. Computer 34(4), 52–58 (2001)

    Article  Google Scholar 

  21. Proriol, J.: Sur une famille de polynomes á deux variables orthogonaux dans un triangle. C. R. Acad. Sci. 245(26), 2459–2461 (1957)

    MathSciNet  MATH  Google Scholar 

  22. Reguly, I.Z., Lszl, E., Mudalige, G.R., Giles, M.B.: Vectorizing unstructured mesh computations for many-core architectures. Concurr. Comput. Pract. Exp. 28(2), 557–577 (2016)

    Article  Google Scholar 

  23. Salehipour, H., Stuhne, G., Peltier, W.: A higher order discontinuous Galerkin, global shallow water model: global ocean tides and aquaplanet benchmarks. Ocean Model. 69, 93–107 (2013)

    Article  Google Scholar 

  24. Satish, N., Kim, C., Chhugani, J., Saito, H., Krishnaiyer, R., Smelyanskiy, M., Girkar, M., Dubey, P.: Can traditional programming bridge the ninja performance gap for parallel computing applications? SIGARCH Comput. Architect. News 40(3), 440–451 (2012)

    Article  Google Scholar 

  25. Sutter, H.: The free lunch is over: a fundamental turn toward concurrency in software. Dr. Dobbs J. 30(3), 202–210 (2005)

    Google Scholar 

  26. Tanaka, S., Bunya, S., Westerink, J., Dawson, C., Luettich, R.A.: Scalability of an unstructured grid continuous Galerkin based hurricane storm surge model. J. Sci. Comput. 46(3), 329–358 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  27. Wirasaet, D., Tanaka, S., Kubatko, E.J., Westerink, J.J., Dawson, C.: A performance comparison of nodal discontinuous Galerkin methods on triangles and quadrilaterals. Int. J. Numer. Methods Fluids 64, 1326–1362 (2010)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgments

This work was supported by the National Science Foundation Grants DMS-1228212, ACI-1339738, and ACI-1339801. J.J. Westerink was also partly supported by the Henry J. Massman and the Joseph and Nona Ahearn endowments at the University of Notre Dame. The authors acknowledge the Texas Advanced Computing Center (TACC) at The University of Texas at Austin for providing HPC resources that have contributed to the research results reported within this paper. Also we would like to thank TACC research scientist John McCalpin for assisting with the hardware counter results. URL: http://www.tacc.utexas.edu. The benchmark studies were performed using the XSEDE Allocation TG-DMS080016N.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to S. R. Brus.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Brus, S.R., Wirasaet, D., Westerink, J.J. et al. Performance and Scalability Improvements for Discontinuous Galerkin Solutions to Conservation Laws on Unstructured Grids. J Sci Comput 70, 210–242 (2017). https://doi.org/10.1007/s10915-016-0249-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10915-016-0249-y

Keywords

Navigation