Skip to main content
Log in

A Fast MHD Code for Gravitationally Stratified Media using Graphical Processing Units: SMAUG

  • Published:
Journal of Astrophysics and Astronomy Aims and scope Submit manuscript

Abstract

Parallelization techniques have been exploited most successfully by the gaming/graphics industry with the adoption of graphical processing units (GPUs), possessing hundreds of processor cores. The opportunity has been recognized by the computational sciences and engineering communities, who have recently harnessed successfully the numerical performance of GPUs. For example, parallel magnetohydrodynamic (MHD) algorithms are important for numerical modelling of highly inhomogeneous solar, astrophysical and geophysical plasmas. Here, we describe the implementation of SMAUG, the Sheffield Magnetohydrodynamics Algorithm Using GPUs. SMAUG is a 1–3D MHD code capable of modelling magnetized and gravitationally stratified plasma.

The objective of this paper is to present the numerical methods and techniques used for porting the code to this novel and highly parallel compute architecture. The methods employed are justified by the performance benchmarks and validation results demonstrating that the code successfully simulates the physics for a range of test scenarios including a full 3D realistic model of wave propagation in the solar atmosphere.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9
Figure 10
Figure 11
Figure 12
Figure 13

Similar content being viewed by others

References

  • Brio, M., Wu, C. C. 1988, Journal of Computational Physics, 75, 400.

  • Caunt, S. E., Korpi, M. J. 2001, AA, 369, 706.

  • Chmielewski, P., Srivastava, A. K., Murawski, K., Musielak, Z. E. 2013, MNRAS, 428, 40–49.

  • Chmielewski, P., Murawski, K., Musielak, Z. E., Srivastava, A. K. 2014, AJ, 793 (1), 43.

  • Cook, S. 2012, CUDA Programming: A Developer’s Guide to Parallel Computing with GPUs (Applications of GPU Computing Series), Morgan Kaufmann.

  • Farber, R. 2011, CUDA Application Design and Development, Morgan Kaufmann.

  • Fedun, V., Erdélyi, R., Shelyag, S. 2009, SP, 258, 219.

  • Fedun, V., Verth, G., Jess, D. B., Erdélyi, R. 2011c, AJL, 740, L46.

  • Fedun, V., Shelyag, S., Verth, G., Mathioudakis, M., Erdélyi, R. 2011b, Annales Geophysicae, 29, 1029.

  • Fedun, V., Shelyag, S., Erdélyi, R. 2011a, AJ, 727, 17.

  • Govett, M. 2009, The Fortran-to-CUDA compiler.

  • Han, T. D., Abdelrahman, T. S. 2011, IEEE Transactions, 22, 78.

  • Harvey, M. J., De Fabritiis, G. 2011, Computer Physics Communications, 182, 1093.

  • Hasan, S. S., van Ballegooijen, A. A., Kalkofen, W., Steiner, O. 2005, AJ, 631, 1270.

  • Kestener, P., Fromang, S. 2012, Astronomical Society of the Pacific Conference Series, 459, 222.

  • Kirk, D. 2010, Programming Massively Parallel Processors: A Hands-on Approach (Applications of GPU Computing Series), Morgan Kaufmann.

  • Kirk, D., Hwu,W. 2010, Programming Massively Parallel Processors: A Hands-on Approach, Elsevier Direct.

  • Lin, L., Ng, C.-S., Bhattacharjee, A. 2011, GPU Accelerated Reduced MHD Simulations of Coronal Loops, Twenty Second International Conference on Numerical Simulations of Plasmas.

  • Lindholm, E., Nickolls, J., Oberman, S., Montrym, J. 2008, IEEE, 28, 39.

  • McLaughlin, J. A., De Moortel, I., Hood, A. W., Brady, C. S. 2009, AA, 493, 227.

  • Mumford, S., Fedun, V., Erdelyi, R. 2014, Generation of Magnetohydrodynamic Waves in Low Solar Atmospheric Flux Tubes by Photospheric Motions, ArXiv e-prints.

  • Orszag, S. A., Tang, C. M. 1979, Journal of Fluid Mechanics, 90, 129.

  • Pang, Bijia, li Pen, Ue, Perrone, Michael 2010, Magnetohydrodynamics on Heterogeneous Architectures: A Performance Comparison.

  • Pressman, R. S. 1997, Software Engineering (A Practitioners Approach), McGraw Hill.

  • Roe, P. L. 1981, Journal of Computational Physics, 43, 357.

  • Schive, H.-Y., Zhang, U.-H., Chiueh, T. 2011, Directionally Unsplit Hydrodynamic Schemes with Hybrid MPI/OpenMP/GPU Parallelization in AMR, ArXiv e-prints.

  • Scullion, E., Erdélyi, R., Fedun, V., Doyle, J. G. 2011, AJ, 743, 14.

  • Sedov, Leonid I. 1946, Journal of Applied Mathematics and Mechanics, 10, 241.

  • Selwa, M., Ofman, L., Murawski, K. 2007, AJL, 668, L83.

  • Shelyag, S., Fedun, V., Erdélyi, R. 2008, AA, 486, 655.

  • Sødergaard, Peter, Hansen, Jan Vittrup 1999, Numerical Computation with Shocks.

  • Stein, R. F., Nordlund, A. 1998, AJ, 499, 914.

  • Taylor, G. 1950, Proceedings of the Royal Society: A Mathematical Physical and Engineering Sciences, 201, 159.

  • Tóth, G. 1996, AJL, 34, 245.

  • Tóth, G. 2000, Journal of Computational Physics, 161, 605.

  • Toth, G., Keppens, R., Botchev, M. A. 1998, AA, 332, 1159.

  • Vigeesh, G., Fedun, V., Hasan, S. S., Erdélyi, R. 2011, 3D Simulations of Magnetohydrodynamic Waves in the Magnetized Solar Atmosphere, ArXiv e-prints.

  • Wang, T., Ofman, L., Davila, J. M. 2013, AJL, 775, L23.

  • Whitehead, N., Fit-florea, A. 2011, Precision and Performance: Floating point and IEEE 754 compliance for NVIDIA GPUs. NVIDIA white paper, 21 (10), 767.

  • Wienke, S., Springer, P., Terboven, C., an Mey, D. 2012, Lecture Notes in Computer Science, 7484, 859.

  • Wolfe, M. 2008, Compilers and More: A GPU and Accelerator Programming Model.

  • Wong, H. C., Wong, U. H., Feng, X, Tang, Z. 2011, Computer Physics Communications, 182, 2132.

  • Wong, U. H., Aoki, T., Wong, H. C. 2014a, Computer Physics Communications, 185, 1901.

  • Wong, U. H., Wong, H. C., Ma, Y. 2014b, Computer Physics Communications, 185, 144.

  • Zink, B. 2011, HORIZON: Accelerated general relativistic magnetohydrodynamics, ArXiv e-prints.

Download references

Acknowledgements

MKG acknowledges support from the White Rose Grid e-Science Centre and funding from the EPSRC contract EP/F057644/1. MKG acknowledges the support of NVIDIA for allowing testing of the K20 GPU through its early access program and for the donation of a K20 GPU. RE acknowledges M. Kéray for patient encouragement and is also grateful to NSF, Hungary (OTKA, Ref. No. K83133). The authors thank the Science and Technology Facilities Council (STFC), UK, for the support they received. They also acknowledge Anthony Brookfield and Corporate Information and Computing Services at The University of Sheffield, for the provision of the High Performance Computing Service.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to M. K. Griffiths.

Appendices

Appendices

1.1 Appendix A. The full MHD equations

The full MHD equations, including the hyper-diffusion source terms, read as

$$ {{\partial \tilde{\rho}}\over{\partial t}}+\nabla\cdot\left(\textbf{v}(\rho_{\mathrm{b}} + \tilde{\rho})\right)=0+D_{\rho}(\tilde{\rho}), $$
(A1)
$$\begin{array}{@{}rcl@{}} &&{{\partial [ (\tilde{\rho}+\rho_{\mathrm{b}} ) \textbf{v}]}\over{\partial t}}+\nabla\cdot(\textbf{v}(\tilde{\rho} +\rho_{\mathrm{b}})\textbf{v}-\tilde{\mathbf{B}} \tilde{\mathbf{B}} )\\ && -\nabla [\tilde{\mathbf{B}}\textbf{B}_{\mathbf{b}}+\textbf{B}_{\mathbf{b}}\tilde{\mathbf{B}}] +\nabla \tilde{p}_{\mathrm{t}} =\tilde{\rho}\textbf{g}+\mathbf{D}_{\rho \mathrm{v}}[(\tilde{\rho }+\rho_{\mathrm{b}})\mathbf{v}], \end{array} $$
(A2)
$$\begin{array}{@{}rcl@{}} {{\partial \tilde{e}}\over{\partial t}}+\nabla\cdot(\textbf{v}(\tilde{e}+e_{\mathrm{b}})-\tilde{\mathbf{B}} \tilde{\mathbf{B}}\cdot\textbf{v}+\textbf{v}\tilde{p}_{\mathrm{t}})-\nabla [(\tilde{\mathbf{B}}\textbf{B}_{\mathrm{b}}+\textbf{B}_{\mathrm{b}}\tilde{\mathbf{B}})\cdot\mathbf{v}]\\ +p_{\text{tb}}\nabla \textbf{v}-\textbf{B}_{\mathrm{b}}\textbf{B}_{\mathrm{b}}\nabla \textbf{v}=\tilde{\rho}\textbf{g}\cdot\textbf{v}+D_{\mathrm{e}}(\tilde{e}), \end{array} $$
(A3)
$$ {{\partial\tilde{\textbf{B}}}\over{\partial t}} +\nabla \cdot(\textbf{v}(\tilde{\mathbf{B}} + \textbf{B}_{\mathrm{b}})-(\tilde{ \mathbf{B}} + \textbf{B}_{\mathrm{b}})\textbf{v})=0+\mathbf{D}_{\mathrm{B}}(\tilde{\mathbf{B}}), $$
(A4)

where

$$ \tilde{p}_{\mathrm{t}}=\tilde{p}_{\mathrm{k}}+{{\textbf{B}^{2}}\over{2}}+\textbf{B}_{\mathrm{b}}\tilde{\mathbf{B}}, $$
(A5)
$$ \tilde{p}_{\mathrm{k}}=(\gamma -1)\left(\tilde{e}-{{(\rho_{\mathrm{b}}+\tilde{\rho})\textbf{v}^{2}}\over{2}}- \textbf{B}_{\mathrm{b}}\tilde{\mathbf{B}}-{\tilde{\textbf{B}^{2}}\over{2}}\right), $$
(A6)
$$ p_{\text{tb}}=p_{\text{kb}}+{{\textbf{B}_{\mathrm{b}}^{2}}\over{2}}, $$
(A7)
$$ p_{\text{kb}}=(\gamma -1)\left(e_{\mathrm{b}}-{{\textbf{B}_{\mathrm{b}}^{2}}\over{2}} \right). $$
(A8)

The MHD equations for momentum (A2) and energy (A3) are re-ordered by taking into account the magnetohydrostatic equilibrium condition. It was found advantageous to use this, apparently more complex form, of the MHD equations for numerical simulations of processes in a gravitationally stratified plasma. The 0+ term in equations (A1) and (A4) indicates that the right hand side of these expressions are zero without the inclusion of the numerical diffusion terms. It is important to note that the fully perturbed expressions include diffusive source terms on the right-hand-side. These terms are employed to stabilise the computational solutions using proven hyperdiffusion method (Caunt & Korpi 2001; Stein & Nordlund 1998). A helpful discussion of the hyper-diffusion method and its implementation is provided by Sødergaard and Hansen (1999). The numerical algorithm implemented to solve equations (A1)–(A8) is based on a fourth order central differencing method for spatial derivatives and a second or fourth order Runge-Kutta solver for the time derivatives. It is important to note that, since we use central differencing, the quantity ∇⋅B is conserved and may be constrained to zero. Schemes preserving the constraint ∇⋅B=0 have been reviewed by Tóth (2000). The mechanism of hyperdiffusion is used to control the stability of the solutions at discontinuities and shock fronts in equations (A1)–(A4). The additional hyper-diffusive terms for the perturbed density and energy are written as

$$ D_{\rho}={\sum}_{i} {{\partial}\over{\partial x_{i}}}\nu_{i}(\rho) {{\partial}\over{\partial x_{i}}}\tilde{\rho}. $$
(A9)

The term D e in the energy equation (A3) consists of three parts:

$$ D_{\mathrm{e}}=D_{\mathrm{e}}^{\text{diffusive}}+D_{\mathrm{e}}^{\text{viscous}}+D_{\mathrm{e}}^{\text{ohmic}}, $$
(A10)

which describe thermal diffusion, viscous and ohmic heating of the plasma, respectively,

$$ D_{\mathrm{e}}^{\text{diffusive}}={\sum}_{i} {{\partial}\over{\partial x_{i}}}\nu_{i}(e) {{\partial}\over{\partial x_{i}}}\tilde{\epsilon}, $$
(A11)

where \(\tilde {\epsilon }\) is the thermal energy perturbation,

$$ \tilde{\epsilon}=\tilde{e}-(\rho_{\mathrm{b}}+\tilde{\rho}){{\textbf{v}^{2}}\over{2}}-{{\tilde{\mathbf{B}^{2}}}\over{2}}. $$
(A12)

The hyperviscous and ohmic heating terms in equation (A10) are set as follows:

$$ \textbf{D}_{\mathrm{e}}^{\text{visc}}=\nabla \cdot (\textbf{v}\cdot \tau ), $$
(A13)

and

$$ \textbf{D}_{\mathrm{e}}^{\text{ohmic}}=\nabla \cdot (\textbf{B}\times \boldsymbol{\upvarepsilon} ). $$
(A14)

For the vector variables the hyper-diffusive terms are

$$ \textbf{D}_{\rho \nu}=\nabla \cdot \mathbf{\tau}, $$
(A15)

and

$$ \textbf{D}_{\textbf{B}}= -\nabla \times \boldsymbol{\upvarepsilon}. $$
(A16)

Here, τ is the viscous stress tensor

$$ \tau_{kl}={{1}\over{2}}(\tilde{\rho}+\rho_{b})\left[ \nu_{k}(\nu_{l}) {{\partial \nu_{l}}\over{\partial x_{k}}} + \nu_{l}(\nu_{k}) {{\partial \nu_{k}}\over{\partial x_{l}}} \right], $$
(A17)

and ε is defined by

$$ \varepsilon_{k}=\epsilon_{klm}\left[ \nu_{l}(\tilde{B}_{m}) {{\partial \tilde{B}_{m}}\over{\partial x_{l}}} \right], $$
(A18)

where 𝜖 k l m is the Levi-Civita symbol and a summation of the indices l and m is taken. The hyper-viscosity coefficient for the variable u in the i-th direction is

$$ \nu_{i}(u)={c_{2}^{u}}{\Delta} x_{i}v_{t}{{\max \left| {{\Delta}_{i}^{3}} u\right|}\over{\max \left| {{\Delta}_{i}^{1}}u\right|}}, $$
(A19)

where v t =v a +v s is the sum of the maximum Alfvén and sound speeds in the domain, \({{\Delta }^{3}_{i}}\) and \({{\Delta }^{1}_{i}}\) are forward difference operators to third and first order, respectively, in the i-th direction. The spatial resolutions are given by Δx i . The coefficient \({c_{2}^{u}}\) is set such that it ensures numerical stability of the solution. The SAC code uses the MPI VAC software as the basis for its implementation (Tóth 1996) that includes the MPI domain decomposition. However, this MPI implementation had to be generalized to include the exchange of the hyper-diffusive contributions for the neighbouring domains.

Appendix B. Verification tests

1.1 B1. Brio and Wu shock tube

The Brio and Wu shock tube is a 1D ideal MHD test problem in which the initial conditions of the model feature a discontinuity in the centre of the configuration, i.e., the left and right states are initialized to different values at x=0.5 (Brio & Wu 1988). On either side of the discontinuity the initial parameters are: p l = 1, p r = 0.1, ρ l = 1, ρ r = 0.125, B y l = 1, B y r = −1, B x = 0.75. Brio and Wu used an upwind differencing scheme and a linearization method called the Roe procedure to solve a Riemann-type problem (Roe 1981). The Riemann problem is the solution of a conserved system which has states separated by a discontinuity. In the method employed by Brio and Wu, the exact solution is approximated by a linearised version, which is averaged on either side of the discontinuity. Running the problem on a numerical domain with 800 grid points, gives an excellent agreement with the original SAC code results (Shelyag et al. 2008). Figure B1 exhibits features such as the slow compound wave at x=0.475, the contact discontinuity at x=0.55, the slow shock at x=0.65; and the fast rarefaction at x=0.85 are observed. This test ensures the validity of the code by running the numerical simulation along the x,y and z axis in turn. There was found to be no dependency on the orientation of the Brio–Wu and shock tube problem.

Figure B1
figure 14

A snapshot of the Brio and Wu problem numerical solution taken at time t = 0.1. The density, tangential velocity, normal velocity and tangential magnetic field are shown.

1.2 B2. Orszag–Tang vortex

The complexities in the Orszag–Tang vortex present a challenging problem with which the code may be further validated in two dimensional geometry (Orszag & Tang 1979). The code is required to be sufficiently robust so that it can handle supersonic MHD turbulence. For this test, a 512 × 512 simulation box was used. The boundaries of the computational domain were set as periodic. This problem provides an excellent opportunity to test the background terms by assigning the initial density, initial magnetic field and internal energy to the respective background fields.

The density distributions at t=0.1,0.26,0.42 and 0.58 s generated by SMAUG are shown in Figure B2 and demonstrate a very good and convincing agreement with their counterpart output of SAC (Shelyag et al. 2008). The results demonstrate the robustness of the code since it is now able to provide well-defined shock fronts.

Figure B2
figure 15

Orszag–Tang vortex problem computed with SAMUG on a 512×512 grid. The density distribution at t = 0.1 s (top left), 0.26 s (top right), 0.42 s (bottom left), 0.58 s (bottom right) is shown.

1.3 B3. Blast waves in 2D and 3D

This section extends the testing of SMAUG from two-dimensions to a three-dimensional grid with an emphasis on well-characterised blast problems. A blast wave in hydrodynamics is the pressure and flow which results after the release of a highly localized amount of energy. A good astrophysical example is the explosion of a supernova. The comparison between SAC and SMAUG using blast wave problems was particularly useful for checking the extension of the application from 2D to 3D. Also, by employing the Sedov shock wave, we were able to concentrate the hyperdiffusion tests on the momentum terms. Additionally, the Sedov shock wave problem enables a test of the code’s capability to model spherically symmetric systems using Cartesian geometry. A solution of the blast wave problem, known as the similarity solution, was developed independently by Taylor (1950) and Sedov (1946). The similarity analysis provides a prediction of the radius of the shock front as a function of time, i.e.,

$$ R(t)=\left({{E}\over{\rho_{0}}}\right)^{{1}\over{5}}t^{{2}\over{5}}, $$
(B1)

where E is the energy released by the explosion, ρ 0 is the density of the surrounding medium. The similarity solution can be derived by starting from the expression for the radius of the shock front as

$$ R(t)=E^{\zeta}\rho_{0}^{\eta}t^{\xi}. $$
(B2)

By ensuring that both sides of equation (B2) have the same dimensions, the indices ζ, η and ξ can be determined using the similarity solution given by equation (B1). The simulations were performed using a 128×128×128 numerical domain with a side of length 1.0 and a surrounding space of uniform density of 1.0 (Figure B3). The initial pressure was set to P=0.1, however, for a shell of radius, r<0.1, the pressure was set to 10.0. The initial velocity was zero and the adiabatic parameter was set to 5/3.

Figure B3
figure 16

The result of simulation of Sedov blast test problem on 128×128×128 numerical grid with SMAUG. The energy distributions as a function of time are shown at time 1000 (top left), 10000 (top right), 20000 (bottom left) and 30000 (bottom right) iterations, respectively.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Griffiths, M.K., Fedun, V. & Erdélyi, R. A Fast MHD Code for Gravitationally Stratified Media using Graphical Processing Units: SMAUG. J Astrophys Astron 36, 197–223 (2015). https://doi.org/10.1007/s12036-015-9328-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12036-015-9328-y

Key words

Navigation