Journal of Scientific Computing

, Volume 70, Issue 1, pp 210–242 | Cite as

Performance and Scalability Improvements for Discontinuous Galerkin Solutions to Conservation Laws on Unstructured Grids

  • S. R. BrusEmail author
  • D. Wirasaet
  • J. J. Westerink
  • C. Dawson


This paper presents a computational framework developed to improve both the serial and parallel performance of two dimensional, unstructured, discontinuous Galerkin (DG) solutions to hyperbolic conservation laws. The coding techniques employed factor in advancements trending in HPC technologies. They are designed to maximize loop vectorization, efficiently utilize cache, facilitate straightforward shared memory parallelization, reduce message passing volume, and increase the overlap between computation and communication. With today’s CPU technology and HPC networks rapidly evolving, it is important to quantitatively assess and compare these methodologies with standard paradigms in order to maximize current computational resources. In our benchmark studies, we specifically investigate the shallow water equations to show that the refactored algorithm implementation is able to provide a significant performance increase over the conventional elemental DG code structure in terms of both CPU time and parallel scalability. Our results show that the serial optimizations result in a 28–38 % performance increase. For parallel computations our improvements give rise to a 1.5–2.0 speedup factor for local problem sizes between 10 and 2000 elements per core, regardless of the overall problem size. The computational benchmarks were performed on the Lonestar and Stampede supercomputers at the Texas Advanced Computing Center.


Parallel computing Conservation laws Shallow water equations Discontinuous Galerkin Finite element method 



This work was supported by the National Science Foundation Grants DMS-1228212, ACI-1339738, and ACI-1339801. J.J. Westerink was also partly supported by the Henry J. Massman and the Joseph and Nona Ahearn endowments at the University of Notre Dame. The authors acknowledge the Texas Advanced Computing Center (TACC) at The University of Texas at Austin for providing HPC resources that have contributed to the research results reported within this paper. Also we would like to thank TACC research scientist John McCalpin for assisting with the hardware counter results. URL: The benchmark studies were performed using the XSEDE Allocation TG-DMS080016N.


  1. 1.
    Baggag, A., Atkins, H., Keyes, D.: Parallel implementation of the discontinuous Galerkin method. In: Parallel Computational Fluid Dynamics, pp. 115–122 (1999)Google Scholar
  2. 2.
    Bell, J.B., Dawson, C.N., Shubin, G.R.: An unsplit, higher order Godunov method for scalar conservation laws in multiple dimensions. J. Comput. Phys. 74(1), 1–24 (1988)CrossRefzbMATHGoogle Scholar
  3. 3.
    Biswas, R., Devine, K.D., Flaherty, J.E.: Parallel, adaptive finite element methods for conservation laws. Appl. Numer. Math. 14(1), 255–283 (1994)MathSciNetCrossRefzbMATHGoogle Scholar
  4. 4.
    Cockburn, B., Shu, C.: The Runge-Kutta discontinuous Galerkin method for conservation laws V. Multidimnesional systems. J. Comput. Phys. 141, 199–224 (1998)MathSciNetCrossRefzbMATHGoogle Scholar
  5. 5.
    Cockburn, B., Shu, C.W.: TVB Runge-Kutta local projection discontinuous Galerkin finite element method for conservation laws. II. General framework. Math. Comput. 52(186), 411–435 (1989)MathSciNetzbMATHGoogle Scholar
  6. 6.
    Dubiner, M.: Spectral methods on triangle and other domains. J. Sci. Comput. 6, 345–390 (1991)MathSciNetCrossRefzbMATHGoogle Scholar
  7. 7.
    Franchetti, F., Kral, S., Lorenz, J., Ueberhuber, C.: Efficient utilization of SIMD extensions. Proc. IEEE 93(2), 409–425 (2005)CrossRefGoogle Scholar
  8. 8.
    Gwennap, L.: Sandy bridge spans generations. Microprocess. Rep. 9(27), 10-01 (2010)Google Scholar
  9. 9.
    Hager, G., Wellein, G.: Introduction to High Performance Computing for Scientists and Engineers. CRC Press, Boca Raton (2010)CrossRefGoogle Scholar
  10. 10.
    Hassaballah, M., Omran, S., Mahdy, Y.B.: A review of SIMD multimedia extensions and their usage in scientific and engineering applications. Comput. J. 51(6), 630–649 (2008)CrossRefGoogle Scholar
  11. 11.
    Hu, C., Shu, C.W.: Weighted essentially non-oscillatory schemes on triangular meshes. J. Comput. Phys. 150(1), 97–127 (1999)MathSciNetCrossRefzbMATHGoogle Scholar
  12. 12.
    Intel Corporation: Intel 64 and IA-32 Architectures Software Developer’s Manual, vol. 1 (2015).
  13. 13.
    Karypis, G., Kumar, V.: Multilevel k-way partitioning scheme for irregular graphs. J. Parallel Distrib. Comput. 48(1), 96–129 (1998)MathSciNetCrossRefzbMATHGoogle Scholar
  14. 14.
    Kelly, J.F., Giraldo, F.X.: Continuous and discontinuous Galerkin methods for a scalable three-dimensional nonhydrostatic atmospheric model: Limited-area mode. J. Comput. Phys. 231(24), 7988–8008 (2012)MathSciNetCrossRefzbMATHGoogle Scholar
  15. 15.
    Klöckner, A., Warburton, T., Bridge, J., Hesthaven, J.: Nodal discontinuous Galerkin methods on graphics processors. J. Comput. Phys. 228(21), 7863–7882 (2009)MathSciNetCrossRefzbMATHGoogle Scholar
  16. 16.
    Koornwinder, T.: Two-variable analogues of the classical orthogonal polynomials. In: Theory and Applications of Special Functions, pp. 435–495 (1975)Google Scholar
  17. 17.
    Kubatko, E.J., Bunya, S., Dawson, C., Westerink, J.J., Mirabito, C.: A performance comparison of continuous and discontinuous finite element shallow water models. J. Sci. Comput. 40, 315–339 (2009)MathSciNetCrossRefzbMATHGoogle Scholar
  18. 18.
    Kubatko, E.J., Westerink, J.J., Dawson, C.: hp discontinuous Galerkin methods for advection dominated problems in shallow water flow. Comput. Methods Appl. Mech. Eng. 196, 437–451 (2006)CrossRefzbMATHGoogle Scholar
  19. 19.
    LeVeque, R.J.: Numerical Methods for Conservation Laws, vol. 132. Birkhäuser, Basel (1992)CrossRefzbMATHGoogle Scholar
  20. 20.
    Mudge, T.: Power: a first-class architectural design constraint. Computer 34(4), 52–58 (2001)CrossRefGoogle Scholar
  21. 21.
    Proriol, J.: Sur une famille de polynomes á deux variables orthogonaux dans un triangle. C. R. Acad. Sci. 245(26), 2459–2461 (1957)MathSciNetzbMATHGoogle Scholar
  22. 22.
    Reguly, I.Z., Lszl, E., Mudalige, G.R., Giles, M.B.: Vectorizing unstructured mesh computations for many-core architectures. Concurr. Comput. Pract. Exp. 28(2), 557–577 (2016)CrossRefGoogle Scholar
  23. 23.
    Salehipour, H., Stuhne, G., Peltier, W.: A higher order discontinuous Galerkin, global shallow water model: global ocean tides and aquaplanet benchmarks. Ocean Model. 69, 93–107 (2013)CrossRefGoogle Scholar
  24. 24.
    Satish, N., Kim, C., Chhugani, J., Saito, H., Krishnaiyer, R., Smelyanskiy, M., Girkar, M., Dubey, P.: Can traditional programming bridge the ninja performance gap for parallel computing applications? SIGARCH Comput. Architect. News 40(3), 440–451 (2012)CrossRefGoogle Scholar
  25. 25.
    Sutter, H.: The free lunch is over: a fundamental turn toward concurrency in software. Dr. Dobbs J. 30(3), 202–210 (2005)Google Scholar
  26. 26.
    Tanaka, S., Bunya, S., Westerink, J., Dawson, C., Luettich, R.A.: Scalability of an unstructured grid continuous Galerkin based hurricane storm surge model. J. Sci. Comput. 46(3), 329–358 (2011)MathSciNetCrossRefzbMATHGoogle Scholar
  27. 27.
    Wirasaet, D., Tanaka, S., Kubatko, E.J., Westerink, J.J., Dawson, C.: A performance comparison of nodal discontinuous Galerkin methods on triangles and quadrilaterals. Int. J. Numer. Methods Fluids 64, 1326–1362 (2010)MathSciNetCrossRefzbMATHGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2016

Authors and Affiliations

  • S. R. Brus
    • 1
    Email author
  • D. Wirasaet
    • 1
  • J. J. Westerink
    • 1
  • C. Dawson
    • 2
  1. 1.Computational Hydraulics Laboratory, Department of Civil and Environmental Engineering and Earth SciencesUniversity of Notre DameNotre DameUSA
  2. 2.Institute for Computational Engineering and SciencesThe University of Texas at AustinAustinUSA

Personalised recommendations