Abstract
This paper presents a computational framework developed to improve both the serial and parallel performance of two dimensional, unstructured, discontinuous Galerkin (DG) solutions to hyperbolic conservation laws. The coding techniques employed factor in advancements trending in HPC technologies. They are designed to maximize loop vectorization, efficiently utilize cache, facilitate straightforward shared memory parallelization, reduce message passing volume, and increase the overlap between computation and communication. With today’s CPU technology and HPC networks rapidly evolving, it is important to quantitatively assess and compare these methodologies with standard paradigms in order to maximize current computational resources. In our benchmark studies, we specifically investigate the shallow water equations to show that the refactored algorithm implementation is able to provide a significant performance increase over the conventional elemental DG code structure in terms of both CPU time and parallel scalability. Our results show that the serial optimizations result in a 28–38 % performance increase. For parallel computations our improvements give rise to a 1.5–2.0 speedup factor for local problem sizes between 10 and 2000 elements per core, regardless of the overall problem size. The computational benchmarks were performed on the Lonestar and Stampede supercomputers at the Texas Advanced Computing Center.
Similar content being viewed by others
References
Baggag, A., Atkins, H., Keyes, D.: Parallel implementation of the discontinuous Galerkin method. In: Parallel Computational Fluid Dynamics, pp. 115–122 (1999)
Bell, J.B., Dawson, C.N., Shubin, G.R.: An unsplit, higher order Godunov method for scalar conservation laws in multiple dimensions. J. Comput. Phys. 74(1), 1–24 (1988)
Biswas, R., Devine, K.D., Flaherty, J.E.: Parallel, adaptive finite element methods for conservation laws. Appl. Numer. Math. 14(1), 255–283 (1994)
Cockburn, B., Shu, C.: The Runge-Kutta discontinuous Galerkin method for conservation laws V. Multidimnesional systems. J. Comput. Phys. 141, 199–224 (1998)
Cockburn, B., Shu, C.W.: TVB Runge-Kutta local projection discontinuous Galerkin finite element method for conservation laws. II. General framework. Math. Comput. 52(186), 411–435 (1989)
Dubiner, M.: Spectral methods on triangle and other domains. J. Sci. Comput. 6, 345–390 (1991)
Franchetti, F., Kral, S., Lorenz, J., Ueberhuber, C.: Efficient utilization of SIMD extensions. Proc. IEEE 93(2), 409–425 (2005)
Gwennap, L.: Sandy bridge spans generations. Microprocess. Rep. 9(27), 10-01 (2010)
Hager, G., Wellein, G.: Introduction to High Performance Computing for Scientists and Engineers. CRC Press, Boca Raton (2010)
Hassaballah, M., Omran, S., Mahdy, Y.B.: A review of SIMD multimedia extensions and their usage in scientific and engineering applications. Comput. J. 51(6), 630–649 (2008)
Hu, C., Shu, C.W.: Weighted essentially non-oscillatory schemes on triangular meshes. J. Comput. Phys. 150(1), 97–127 (1999)
Intel Corporation: Intel 64 and IA-32 Architectures Software Developer’s Manual, vol. 1 (2015). http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-vol-1-manual.pdf
Karypis, G., Kumar, V.: Multilevel k-way partitioning scheme for irregular graphs. J. Parallel Distrib. Comput. 48(1), 96–129 (1998)
Kelly, J.F., Giraldo, F.X.: Continuous and discontinuous Galerkin methods for a scalable three-dimensional nonhydrostatic atmospheric model: Limited-area mode. J. Comput. Phys. 231(24), 7988–8008 (2012)
Klöckner, A., Warburton, T., Bridge, J., Hesthaven, J.: Nodal discontinuous Galerkin methods on graphics processors. J. Comput. Phys. 228(21), 7863–7882 (2009)
Koornwinder, T.: Two-variable analogues of the classical orthogonal polynomials. In: Theory and Applications of Special Functions, pp. 435–495 (1975)
Kubatko, E.J., Bunya, S., Dawson, C., Westerink, J.J., Mirabito, C.: A performance comparison of continuous and discontinuous finite element shallow water models. J. Sci. Comput. 40, 315–339 (2009)
Kubatko, E.J., Westerink, J.J., Dawson, C.: hp discontinuous Galerkin methods for advection dominated problems in shallow water flow. Comput. Methods Appl. Mech. Eng. 196, 437–451 (2006)
LeVeque, R.J.: Numerical Methods for Conservation Laws, vol. 132. Birkhäuser, Basel (1992)
Mudge, T.: Power: a first-class architectural design constraint. Computer 34(4), 52–58 (2001)
Proriol, J.: Sur une famille de polynomes á deux variables orthogonaux dans un triangle. C. R. Acad. Sci. 245(26), 2459–2461 (1957)
Reguly, I.Z., Lszl, E., Mudalige, G.R., Giles, M.B.: Vectorizing unstructured mesh computations for many-core architectures. Concurr. Comput. Pract. Exp. 28(2), 557–577 (2016)
Salehipour, H., Stuhne, G., Peltier, W.: A higher order discontinuous Galerkin, global shallow water model: global ocean tides and aquaplanet benchmarks. Ocean Model. 69, 93–107 (2013)
Satish, N., Kim, C., Chhugani, J., Saito, H., Krishnaiyer, R., Smelyanskiy, M., Girkar, M., Dubey, P.: Can traditional programming bridge the ninja performance gap for parallel computing applications? SIGARCH Comput. Architect. News 40(3), 440–451 (2012)
Sutter, H.: The free lunch is over: a fundamental turn toward concurrency in software. Dr. Dobbs J. 30(3), 202–210 (2005)
Tanaka, S., Bunya, S., Westerink, J., Dawson, C., Luettich, R.A.: Scalability of an unstructured grid continuous Galerkin based hurricane storm surge model. J. Sci. Comput. 46(3), 329–358 (2011)
Wirasaet, D., Tanaka, S., Kubatko, E.J., Westerink, J.J., Dawson, C.: A performance comparison of nodal discontinuous Galerkin methods on triangles and quadrilaterals. Int. J. Numer. Methods Fluids 64, 1326–1362 (2010)
Acknowledgments
This work was supported by the National Science Foundation Grants DMS-1228212, ACI-1339738, and ACI-1339801. J.J. Westerink was also partly supported by the Henry J. Massman and the Joseph and Nona Ahearn endowments at the University of Notre Dame. The authors acknowledge the Texas Advanced Computing Center (TACC) at The University of Texas at Austin for providing HPC resources that have contributed to the research results reported within this paper. Also we would like to thank TACC research scientist John McCalpin for assisting with the hardware counter results. URL: http://www.tacc.utexas.edu. The benchmark studies were performed using the XSEDE Allocation TG-DMS080016N.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Brus, S.R., Wirasaet, D., Westerink, J.J. et al. Performance and Scalability Improvements for Discontinuous Galerkin Solutions to Conservation Laws on Unstructured Grids. J Sci Comput 70, 210–242 (2017). https://doi.org/10.1007/s10915-016-0249-y
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10915-016-0249-y