Abstract
We consider the HPC challenge of fault tolerance in the context of plasma physics simulations using the sparse grid combination technique. In the combination technique formalism, one breaks down a single, highly expensive simulation into many, considerably cheaper independent simulations that are propagated in time and then combined to approximate the results of the full solution. This introduces a new level of parallelism from which various fault tolerance approaches can be deduced. We investigate two such approaches, corresponding to two different simulation modes of the plasma physics code GENE: the simulation of a time-dependent, 5-dimensional PDE, and the computation of certain eigenvalues of the spectrum of a problem-specific linear operator. This paper has two main contributions to the field of fault tolerance with the combination technique. First, we show that the recently developed fault-tolerant combination technique performs well even for highly complex simulation codes, i.e., beyond the usual Poisson or advection problems; and second, we demonstrate a new way to use of the optimized combination technique (OptiCom) in the context of fault tolerance when dealing with eigenvalue computations. This work is a building block of the project EXAHD within the DFG’s Priority Programme “Software for Exascale Computing” (SPPEXA).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Two level vectors \(\boldsymbol{\alpha },\boldsymbol{\beta }\) satisfy \(\boldsymbol{\alpha }\geq \boldsymbol{\beta }\) if \(\alpha _{i} \geq \beta _{i},\quad \forall i \in \{ 1,\mathop{\ldots },d\}\).
- 2.
Modulo the treatment of the boundary, which in some dimensions might add ± 1 discretization points.
- 3.
References
Brizard, A., Hahm, T.: Foundations of nonlinear gyrokinetic theory. Rev. Mod. Phys. 79(2), 421–468 (2007). DOI 10.1103/RevModPhys.79.421
Bungartz, H.J., Griebel, M.: Sparse grids. Acta Numerica 13, 147–269 (2004). DOI 10.1017/S0962492904000182
Cappello, F., Geist, A., Gropp, W., Kale, S., Kramer, B., Snir, M.: Toward exascale resilience: 2014 update. Supercomput. Front. Innov. 1(1), 5–28 (2014)
Das, S., Neumaier, A.: Solving overdetermined eigenvalue oroblems. SIAM J. Sci. Comput. 35(2), 541–560 (2013)
Elliott, J., Hoemmen, M., Mueller, F.: Resilience in numerical methods: a position on fault models and methodologies. arXiv preprint arXiv:1401.3013 (2014)
Garcke, J.: Regression with the optimised combination technique. In: Proceedings of the 23rd international conference on Machine learning, pp. 321–328. ACM Press, New York (2006)
Garcke, J.: An optimised sparse grid combination technique for eigenproblems. Proc. Appl. Math. Mech. 7(1), 1022301–1022302 (2007)
Garcke, J., Griebel, M.: On the computation of the eigenproblems of hydrogen and helium in strong magnetic and electric fields with the sparse grid combination technique. J. Comput. Phys. 165(2), 694–716 (2000). DOI 10.1006/jcph.2000.6627
Jenko, F., Dorland, W., Kotschenreuther, M., Rogers, B.N.: Rogers Electron temperature gradient driven turbulence. Phys. Plasmas 7(5), 1904–1910 (2000). AIP Publishing. http://www.genecode.org/
Goerler, T., Lapillonne, X., Brunner, S., Dannert, T., Jenko, F., Merz, F., Told, D.: The global version of the gyrokinetic turbulence code GENE. J. Comput. Phys. 230, 7053–7071 (2011)
Görler, T.: Multiscale effects in plasma microturbulence. Ph.D. thesis, Universität Ulm (2009)
Griebel, M., Schneider, M., Zenger, C.: A combination technique for the solution of sparse grid problems. In: Iterative Methods in Linear Algebra, pp. 263–281. Elsevier (1992)
Harding, B., Hegland, M.: A robust combination technique. ANZIAM J. 54, C394–C411 (2013)
Harding, B., Hegland, M.: Robust solutions to PDEs with multiple grids. In: Garcke, J., Pflüger, D. (eds.) Sparse Grids and Applications—Munich 2012 SE. Lecture Notes in Computational Science and Engineering, vol. 97, pp. 171–193. Springer, Berlin (2014)
Harding, B., Hegland, M., Larson, J., Southern, J.: Scalable and fault tolerant computation with the sparse grid combination technique. arXiv:1404.2670 (2014)
Harrar II, D., Osborne, M.: Computing eigenvalues of ordinary differential equations. ANZIAM J. 44(April), C313–C334 (2003)
Heene, M., Kowitz, C., Pflüger, D.: Load balancing for massively parallel computations with the sparse grid combination technique. In: PARCO, pp. 574–583. IOS Press (2013)
Hegland, M.: Adaptive sparse grids. ANZIAM J. 44, C335–C353 (2003)
Hegland, M., Garcke, J., Challis, V.: The combination technique and some generalisations. Linear Algebra Appl. 420(2–3), 249–275 (2007)
Hernandez, V., Roman, J.E., Vidal, V.: SLEPc: a scalable and flexible toolkit for the solution of eigenvalue problems. ACM Trans. Math. Softw. 31(3), 351–362 (2005)
Hupp, P., Jacob, R., Heene, M., Pflüger, D., Hegland, M.: Global communication schemes for the sparse grid combination technique. In: PARCO, pp. 564–573. IOS Press (2013)
Kowitz, C., Hegland, M.: The sparse grid combination technique for computing eigenvalues in linear gyrokinetics. Procedia Comput. Sci. 18(0), 449–458 (2013)
Kowitz, C., Hegland, M.: An Opticom Method for Computing Eigenpairs. In: Garcke, J., Pflüger D. (eds.) Sparse Grids and Applications—Munich 2012 SE. Lecture Notes in Computational Science and Engineering, vol. 97, pp. 239–253. Springer, Berlin (2014)
Kowitz, C., Pflüger, D., Jenko, F., Hegland, M.: The combination technique for the initial value problem in linear gyrokinetics. In: Sparse Grids and Applications, Lecture Notes in Computational Science and Engineering, vol. 88, pp. 205–222. Springer, Heidelberg (2012)
Kowitz, C., Pflüger, D., Jenko, F., Hegland, M.: The combination technique for the initial value problem in linear gyrokinetics. In: Sparse Grids and Applications, pp. 205–222. Springer, Berlin (2013)
Larson, J.W., Hegland, M., Harding, B., Roberts, S., Stals, L., Rendell, A.P., Strazdins, P., Ali, M.M., Kowitz, C., Nobes, R., et al.: Fault-tolerant grid-based solvers: combining concepts from sparse grids and mapreduce. Procedia Comput. Sci. 18, 130–139 (2013)
Merz, F.: Gyrokinetic simulation of multimode plasma turbulence. Ph.D. thesis (2009)
Mohr, B., Frings, W.: Jülich blue gene/p extreme scaling workshop 2009. Technical Report, Technical report FZJ-JSC-IB-2010-02. Online at http://juser.fz-juelich.de/record/8924/files/ib-2010-02.ps.gz (2010)
Pflüger, D.: Spatially Adaptive Sparse Grids for High-Dimensional Problems. Verlag Dr. Hut, München (2010)
Pflüger, D., Bungartz, H.-J., Griebel, M., Jenko, F., Dannert, T., Heene, M., Parra Hinojosa, A., Kowitz, C., Zaspel, P.: EXAHD: an exa-scalable two-level sparse grid approach for higher-dimensional problems in plasma physics and beyond. In: Euro-Par 2014: Parallel Processing Workshops, pp. 565–576. Springer (2014)
Shahzad, F., Wittmann, M., Zeiser, T., Hager, G., Wellein, G.: An evaluation of different i/o techniques for checkpoint/restart. In: Proceedings of the 2013 IEEE 27th International Symposium on Parallel and Distributed Processing Workshops and PhD Forum, pp. 1708–1716. IEEE Computer Society, Silver Spring, MD (2013)
Snir, M., Wisniewski, R.W., Abraham, J.A., Adve, S.V., Bagchi, S., Balaji, P., Belak, J., Bose, P., Cappello, F., Carlson, B., et al.: Addressing failures in exascale computing. Int. J. High Perform. Comput. Appl. 28, 129–173 (2014)
Acknowledgements
This work was supported (in part) by the German Research Foundation (DFG) through the Priority Programme 1648 “Software for Exascale Computing” (SPPEXA), along with the support of the Technische Universität München – Institute for Advanced Study, funded by the German Excellence Initiative (and the European Union Seventh Framework Programme under grant agreement n∘ 291763). D. Pflüger further acknowledges the financial support of the DFG within the Cluster of Excellence in Simulation Technology (EXC 310/1), and A. Parra Hinojosa thanks the support of CONACYT, Mexico.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendix: GENE Parameter File
Appendix: GENE Parameter File
& box
kymin = 0.3
lv = 3.00
lw = 9.00
lx = 4.16667
adapt_lx = T
mu_grid_type =
’ clenshaw_curtis ’
/
&general
nonlinear = F
arakawa_zv = T
arakawa_zv_order = 2
calc_dt = F
dt_max = 7.39E −4
courant = 1.0
beta = 0.1E −02
debye2 = 0.0
collision_op = ’none ’
init_cond = ’ fb ’
hyp_z = 2.000
hyp_v = 0.5000
/
& geometry
magn_geometry = ’ circular ’
q0 = 1.4
shat = 0.8
trpeps = 0.18
major_R = 1.0
major_R = 1.0
norm_flux_projection = F
/
&species
name = ’ ions ’
omn = 2.0
omt = 4.5
mass = 1.0
temp = 1.0
dens = 1.0
charge = 1
/
&species
name = ’ electrons ’
omn = 2.0
omt = 3.5
mass = 0.27E −03
temp = 1.5
dens = 1.0
charge = −1
/
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Hinojosa, A.P., Kowitz, C., Heene, M., Pflüger, D., Bungartz, HJ. (2015). Towards a Fault-Tolerant, Scalable Implementation of GENE. In: Mehl, M., Bischoff, M., Schäfer, M. (eds) Recent Trends in Computational Engineering - CE2014. Lecture Notes in Computational Science and Engineering, vol 105. Springer, Cham. https://doi.org/10.1007/978-3-319-22997-3_3
Download citation
DOI: https://doi.org/10.1007/978-3-319-22997-3_3
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-22996-6
Online ISBN: 978-3-319-22997-3
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)