Abstract
The task of integrating a large number of independent ODE systems arises in various scientific and engineering areas. For nonstiff systems, common explicit integration algorithms can be used on GPUs, where individual GPU threads concurrently integrate independent ODEs with different initial conditions or parameters. One example is the fifth-order adaptive Runge–Kutta–Cash–Karp (RKCK) algorithm. In the case of stiff ODEs, standard explicit algorithms require impractically small time-step sizes for stability reasons, and implicit algorithms are therefore commonly used instead to allow larger time steps and reduce the computational expense. However, typical high-order implicit algorithms based on backwards differentiation formulae (e.g., VODE, LSODE) involve complex logical flow that causes severe thread divergence when implemented on GPUs, limiting the performance. Therefore, alternate algorithms are needed. A GPU-based Runge–Kutta–Chebyshev (RKC) algorithm can handle moderate levels of stiffness and performs significantly faster than not only an equivalent CPU version but also a CPU-based implicit algorithm (VODE) based on results shown in the literature. In this chapter, we present the mathematical background, implementation details, and source code for the RKCK and RKC algorithms for use integrating large numbers of independent systems of ODEs on GPUs. In addition, brief performance comparisons are shown for each algorithm, demonstrating the potential benefit of moving to GPU-based ODE integrators.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
An equivalence ratio of one indicates the mixture of fuel and oxidizer set to an appropriate ratio for complete combustion.
References
Alexandrov, V., Sameh, A., Siddique, Y., Zlatev, Z.: Numerical integration of chemical ODE problems arising in air pollution models. Environ. Monit. Assess. 2(4), 365–377 (1997). doi:10.1023/A:1019086016734
Barry, D., Miller, C., Culligan, P., Bajracharya, K.: Analysis of split operator methods for nonlinear and multispecies groundwater chemical transport models. Math. Comput. Simul. 43(3–6), 331–341 (1997). doi:10.1016/S0378-4754(97)00017-7
Barry, D., Bajracharya, K., Crapper, M., Prommer, H., Cunningham, C.: Comparison of split-operator methods for solving coupled chemical non-equilibrium reaction/groundwater transport models. Math. Comput. Simul. 53(1–2), 113–127 (2000). doi:10.1016/S0378-4754(00)00182-8
Cash, J.R., Karp, A.H.: A variable order Runge–Kutta method for initial value problems with rapidly varying right-hand sides. ACM Trans. Math. Softw. 16(3), 201–222 (1990)
Day, M.S., Bell, J.B.: Numerical simulation of laminar reacting flows with complex chemistry. Combust. Theory Model. 4(4), 535–556 (2000). doi:10.1088/1364-7830/4/4/309
Dematte, L., Prandi, D.: GPU computing for systems biology. Brief. Bioinform. 11(3), 323–333 (2010). doi:10.1093/bib/bbq006
Geršgorin, S.: Über die abgrenzung der eigenwerte einer matrix. Bulletin de l’Académie des Sciences de l’URSS. Classe des sciences mathématiques et na (6), 749–754 (1931)
Hairer, E., Wanner, G.: Solving Ordinary Differential Equations II: Stiff and Differential-Algebraic Problems, 2nd edn. Springer Series in Computational Mathematics, vol. 14. Springer, Berlin/Heidelberg (1996)
Hairer, E., Wanner, G., Nørsett, S.P.: Solving Ordinary Differential Equations I: Nonstiff Problems, 2nd edn. Springer Series in Computational Mathematics, vol. 8. Springer, Berlin/Heidelberg (1993). doi:10.1007/978-3-540-78862-1
Helton, J., Davis, F.: Latin hypercube sampling and the propagation of uncertainty in analyses of complex systems. Reliab. Eng. Syst. Saf. 81(1), 23–69 (2003). doi:10.1016/S0951-8320(03)00058-9
Horn, R.A., Johnson, C.R.: Matrix Analysis. Cambridge University Press, Cambridge (1990)
Jang, B., Schaa, D., Mistry, P., Kaeli, D.: Exploiting memory access patterns to improve memory performance in data-parallel architectures. IEEE Trans. Parallel Distrib. Syst. 22(1), 105–118 (2011). doi:10.1109/TPDS.2010.107
Kim, J., Cho, S.Y.: Computation accuracy and efficiency of the time-splitting method in solving atmospheric transport/chemistry equations. Atmos. Environ. 31(15), 2215–2224 (1997)
Kirk, D.B., Hwu, W.W.: Programming Massively Parallel Processors: A Hands-on Approach. Morgan Kaufmann, Burlington (2010)
Knio, O.M., Najm, H.N., Wyckoff, P.S.: A semi-implicit numerical scheme for reacting flow II. Stiff, operator-split formulation. J. Comput. Phys. 154, 428–467 (1999). doi:10.1006/jcph.1999.6322
Kühn, C., Wierling, C., Kühn, A., Klipp, E., Panopoulou, G., Lehrach, H., Poustka, A.: Monte Carlo analysis of an ODE model of the sea urchin endomesoderm network. BMC Syst. Biol. 3, 83 (2009). doi:10.1186/1752-0509-3-83
Law, C.K.: Combustion Physics. Cambridge University Press, New York (2006)
Marino, S., Hogue, I.B., Ray, C.J., Kirschner, D.E.: A methodology for performing global uncertainty and sensitivity analysis in systems biology. J. Theor. Biol. 254(1), 178–19 (2008). doi:10.1016/j.jtbi.2008.04.011
Marinov, N.M.: A detailed chemical kinetic model for high temperature ethanol oxidation. Int. J. Chem. Kinet. 31(3), 183–220 (1999)
Mazzia, F., Magherini, C.: Test Set for Initial Value Problem Solvers, Release 2.4. Department of Mathematics, University of Bari and INdAM, Research Unit of Bari (2008). Available at http://www.dm.uniba.it/~testset
Niemeyer, K.E., Sung, C.J.: Accelerating moderately stiff chemical kinetics in reactive-flow simulations using GPUs. J. Comput. Phys. 256, 854–871 (2014). doi:10.1016/j.jcp.2013.09.025
Niemeyer, K.E., Sung, C.J., Fotache, C.G., Lee, J.C.: Turbulence-chemistry closure method using graphics processing units: a preliminary test. In: 7th Fall Technical Meeting of the Eastern States Section of the Combustion Institute, Storrs (2011)
Nimmagadda, V.K., Akoglu, A., Hariri, S., Moukabary, T.: Cardiac simulation on multi-GPU platform. J. Supercomput. 59(3), 1360–1378 (2011). doi:10.1007/s11227-010-0540-x
OpenMP Architecture Review Board: OpenMP Application Program Interface Version 3.0. http://www.openmp.org/mp-documents/spec30.pdf (2008)
Oran, E.S., Boris, J.P.: Numerical Simulation of Reactive Flow, 2nd edn. Cambridge University Press, Cambridge (2001)
Press, W.H., Teukolsky, S.A., Vetterling, W.T., Flannery, B.P.: Numerical Recipes in Fortran 77: The Art of Scientific Computing, 2nd edn. Cambridge University Press, Cambridge (1992)
Ren, Z., Pope, S.B.: Second-order splitting schemes for a class of reactive systems. J. Comput. Phys. 227(17), 8165–8176 (2008). doi:10.1016/j.jcp.2008.05.019
Schwer, D., Lu, P., Green, W.H., Semiao, V.: A consistent-splitting approach to computing stiff steady-state reacting flows with adaptive chemistry. Combust. Theory Model. 7(2), 383–399 (2003). doi:10.1088/1364-7830/7/2/310
Shi, Y., Green, W.H., Wong, H., Oluwole, O.O.: Accelerating multi-dimensional combustion simulations using hybrid CPU-based implicit/GPU-based explicit ODE integration. Combust. Flame 159(7), 2388–2397 (2012). doi:10.1016/j.combustflame.2012.02.016
Sommeijer, B.P., Shampine, L.F., Verwer, J.G.: RKC: an explicit solver for parabolic PDEs. J. Comput. Appl. Math. 88(2), 315–326 (1997)
Sportisse, B.: An analysis of operator splitting techniques in the stiff case. J. Comput. Phys. 161(1), 140–168 (2000)
Stone, C.P., Davis, R.L.: Techniques for solving stiff chemical kinetics on graphical processing units. J. Propulsion Power 29(4), 764–773 (2013). doi:10.2514/1.B34874
Strang, G.: On the construction and comparison of difference schemes. SIAM J. Numer. Anal. 5(3), 506–517 (1968)
Sundnes, J., Nielsen, B.F., Mardal, K., Cai, X., Lines, G., Tveito, A.: On the computational complexity of the bidomain and the monodomain models of electrophysiology. Ann. Biomed. Eng. 34(7), 1088–1097 (2006). doi:10.1007/s10439-006-9082-z
van der Houwen, P.J.: The development of Runge–Kutta methods for partial differential equations. Appl. Numer. Math. 20, 261–272 (1996)
van der Houwen, P.J., Sommeijer, B.P.: On the internal stability of explicit, m-stage Runge-Kutta methods for large m-values. Z. Angew. Math. Mech. 60(10), 479–485 (1980)
Verwer, J.G.: Explicit Runge–Kutta methods for parabolic partial differential equations. Appl. Numer. Math. 22, 359–379 (1996)
Verwer, J.G., Hundsdorfer, W., Sommeijer, B.P.: Convergence properties of the Runge–Kutta–Chebyshev method. Numer. Math. 57, 57–178 (1990)
Verwer, J.G., Sommeijer, B.P., Hundsdorfer, W.: RKC time-stepping for advection–diffusion–reaction problems. J. Comput. Phys. 201(1), 61–79 (2004). doi:10.1016/j.jcp.2004.05.002
Zhou, Y., Liepe, J., Sheng, X., Stumpf, M.P.H., Barnes, C.: GPU accelerated biochemical network simulation. Bioinformatics 27(6), 874–876 (2011). doi:10.1093/bioinformatics/btr015
Acknowledgements
This work was supported by the US Department of Defense through the National Defense Science and Engineering Graduate Fellowship program, the National Science Foundation Graduate Research Fellowship under grant number DGE-0951783, and the Combustion Energy Frontier Research Center—an Energy Frontier Research Center funded by the US Department of Energy, Office of Science, Office of Basic Energy Sciences under award number DE-SC0001198.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendix
Appendix
Various methods may be used to calculate the spectral radius, including the Gershgorin circle theorem [7, 11] that provides an upper-bound estimate. Here, we provide a function based on a nonlinear power method [30].
1 __device__ double
2 rkcSpecRad (const double t, const double* y, const double g, const double* F, const double hMax, double* v, double* Fv) {
3 // maximum number of iterations
4 const int itmax = 50;
5
6 double small = 1.0 / hmax;
7
8 double nrm1 = 0.0;
9 double nrm2 = 0.0;
10 for (int i = 0; i < NEQN; ++i) {
11 nrm1 += (y[i] * y[i]);
12 nrm2 += (v[i] * v[i]);
13 }
14 nrm1 = sqrt(nrm1);
15 nrm2 = sqrt(nrm2);
16
17 double dynrm;
18 if ((nrm1 != 0.0) && (nrm2 != 0.0)) {
19 dynrm = nrm1 * sqrt(UROUND);
20 for (int i = 0; i < NEQN; ++i) {
21 v[i] = y[i] + v[i] * (dynrm / nrm2);
22 }
23 } else if (nrm1 != 0.0) {
24 dynrm = nrm1 * sqrt(UROUND);
25 for (int i = 0; i < NEQN; ++i) {
26 v[i] = y[i] * (1.0 + sqrt(UROUND));
27 }
28 } else if (nrm2 != 0.0) {
29 dynrm = UROUND;
30 for (int i = 0; i < NEQN; ++i) {
31 v[i] *= (dynrm / nrm2);
32 }
33 } else {
34 dynrm = UROUND;
35 for (int i = 0; i < NEQN; ++i) {
36 v[i] = UROUND;
37 }
38 }
39
40 // now iterate using nonlinear power method
41 double sigma = 0.0;
42 for (int iter = 1; iter <= itmax; ++iter) {
43
44 dydt (t, pr, v, Fv);
45
46 nrm1 = 0.0;
47 for (int i = 0; i < NEQN; ++i) {
48 nrm1 += ((Fv[i] - F[i]) * (Fv[i] - F[i]));
49 }
50 nrm1 = sqrt(nrm1);
51 nrm2 = sigma;
52 sigma = nrm1 / dynrm;
53
54 nrm2 = fabs(sigma - nrm2) / sigma;
55 if ((iter >= 2) && (fabs(sigma - nrm2) <= (fmax(sigma, small) * 0.01))) {
56 for (int i = 0; i < NEQN; ++i) {
57 v[i] -= y[i];
58 }
59 return (1.2 * sigma);
60 }
61
62 if (nrm1 != 0.0) {
63 for (int i = 0; i < NEQN; ++i) {
64 v[i] = y[i] + ((Fv[i] - F[i]) * (dynrm / nrm1));
65 }
66 } else {
67 int ind = (iter % NEQN);
68 v[ind] = y[ind] - (v[ind] - y[ind]);
69 }
70 }
71 return (1.2 * sigma);
72 }
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Niemeyer, K.E., Sung, CJ. (2014). GPU-Based Parallel Integration of Large Numbers of Independent ODE Systems. In: Kindratenko, V. (eds) Numerical Computations with GPUs. Springer, Cham. https://doi.org/10.1007/978-3-319-06548-9_8
Download citation
DOI: https://doi.org/10.1007/978-3-319-06548-9_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-06547-2
Online ISBN: 978-3-319-06548-9
eBook Packages: Computer ScienceComputer Science (R0)