Skip to main content

Towards a Fault-Tolerant, Scalable Implementation of GENE

  • Chapter
Recent Trends in Computational Engineering - CE2014

Abstract

We consider the HPC challenge of fault tolerance in the context of plasma physics simulations using the sparse grid combination technique. In the combination technique formalism, one breaks down a single, highly expensive simulation into many, considerably cheaper independent simulations that are propagated in time and then combined to approximate the results of the full solution. This introduces a new level of parallelism from which various fault tolerance approaches can be deduced. We investigate two such approaches, corresponding to two different simulation modes of the plasma physics code GENE: the simulation of a time-dependent, 5-dimensional PDE, and the computation of certain eigenvalues of the spectrum of a problem-specific linear operator. This paper has two main contributions to the field of fault tolerance with the combination technique. First, we show that the recently developed fault-tolerant combination technique performs well even for highly complex simulation codes, i.e., beyond the usual Poisson or advection problems; and second, we demonstrate a new way to use of the optimized combination technique (OptiCom) in the context of fault tolerance when dealing with eigenvalue computations. This work is a building block of the project EXAHD within the DFG’s Priority Programme “Software for Exascale Computing” (SPPEXA).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 54.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    Two level vectors \(\boldsymbol{\alpha },\boldsymbol{\beta }\) satisfy \(\boldsymbol{\alpha }\geq \boldsymbol{\beta }\) if \(\alpha _{i} \geq \beta _{i},\quad \forall i \in \{ 1,\mathop{\ldots },d\}\).

  2. 2.

    Modulo the treatment of the boundary, which in some dimensions might add ± 1 discretization points.

  3. 3.

    http://www.mac.tum.de/wiki/index.php/MAC_Cluster

References

  1. Brizard, A., Hahm, T.: Foundations of nonlinear gyrokinetic theory. Rev. Mod. Phys. 79(2), 421–468 (2007). DOI 10.1103/RevModPhys.79.421

    Article  MATH  MathSciNet  Google Scholar 

  2. Bungartz, H.J., Griebel, M.: Sparse grids. Acta Numerica 13, 147–269 (2004). DOI 10.1017/S0962492904000182

    Article  MathSciNet  Google Scholar 

  3. Cappello, F., Geist, A., Gropp, W., Kale, S., Kramer, B., Snir, M.: Toward exascale resilience: 2014 update. Supercomput. Front. Innov. 1(1), 5–28 (2014)

    Google Scholar 

  4. Das, S., Neumaier, A.: Solving overdetermined eigenvalue oroblems. SIAM J. Sci. Comput. 35(2), 541–560 (2013)

    Article  MathSciNet  Google Scholar 

  5. Elliott, J., Hoemmen, M., Mueller, F.: Resilience in numerical methods: a position on fault models and methodologies. arXiv preprint arXiv:1401.3013 (2014)

    Google Scholar 

  6. Garcke, J.: Regression with the optimised combination technique. In: Proceedings of the 23rd international conference on Machine learning, pp. 321–328. ACM Press, New York (2006)

    Google Scholar 

  7. Garcke, J.: An optimised sparse grid combination technique for eigenproblems. Proc. Appl. Math. Mech. 7(1), 1022301–1022302 (2007)

    Article  Google Scholar 

  8. Garcke, J., Griebel, M.: On the computation of the eigenproblems of hydrogen and helium in strong magnetic and electric fields with the sparse grid combination technique. J. Comput. Phys. 165(2), 694–716 (2000). DOI 10.1006/jcph.2000.6627

    Article  MATH  MathSciNet  Google Scholar 

  9. Jenko, F., Dorland, W., Kotschenreuther, M., Rogers, B.N.: Rogers Electron temperature gradient driven turbulence. Phys. Plasmas 7(5), 1904–1910 (2000). AIP Publishing. http://www.genecode.org/

  10. Goerler, T., Lapillonne, X., Brunner, S., Dannert, T., Jenko, F., Merz, F., Told, D.: The global version of the gyrokinetic turbulence code GENE. J. Comput. Phys. 230, 7053–7071 (2011)

    Article  MATH  MathSciNet  Google Scholar 

  11. Görler, T.: Multiscale effects in plasma microturbulence. Ph.D. thesis, Universität Ulm (2009)

    Google Scholar 

  12. Griebel, M., Schneider, M., Zenger, C.: A combination technique for the solution of sparse grid problems. In: Iterative Methods in Linear Algebra, pp. 263–281. Elsevier (1992)

    Google Scholar 

  13. Harding, B., Hegland, M.: A robust combination technique. ANZIAM J. 54, C394–C411 (2013)

    MathSciNet  Google Scholar 

  14. Harding, B., Hegland, M.: Robust solutions to PDEs with multiple grids. In: Garcke, J., Pflüger, D. (eds.) Sparse Grids and Applications—Munich 2012 SE. Lecture Notes in Computational Science and Engineering, vol. 97, pp. 171–193. Springer, Berlin (2014)

    Chapter  Google Scholar 

  15. Harding, B., Hegland, M., Larson, J., Southern, J.: Scalable and fault tolerant computation with the sparse grid combination technique. arXiv:1404.2670 (2014)

    Google Scholar 

  16. Harrar II, D., Osborne, M.: Computing eigenvalues of ordinary differential equations. ANZIAM J. 44(April), C313–C334 (2003)

    MathSciNet  Google Scholar 

  17. Heene, M., Kowitz, C., Pflüger, D.: Load balancing for massively parallel computations with the sparse grid combination technique. In: PARCO, pp. 574–583. IOS Press (2013)

    Google Scholar 

  18. Hegland, M.: Adaptive sparse grids. ANZIAM J. 44, C335–C353 (2003)

    MathSciNet  Google Scholar 

  19. Hegland, M., Garcke, J., Challis, V.: The combination technique and some generalisations. Linear Algebra Appl. 420(2–3), 249–275 (2007)

    Article  MATH  MathSciNet  Google Scholar 

  20. Hernandez, V., Roman, J.E., Vidal, V.: SLEPc: a scalable and flexible toolkit for the solution of eigenvalue problems. ACM Trans. Math. Softw. 31(3), 351–362 (2005)

    Article  MATH  MathSciNet  Google Scholar 

  21. Hupp, P., Jacob, R., Heene, M., Pflüger, D., Hegland, M.: Global communication schemes for the sparse grid combination technique. In: PARCO, pp. 564–573. IOS Press (2013)

    Google Scholar 

  22. Kowitz, C., Hegland, M.: The sparse grid combination technique for computing eigenvalues in linear gyrokinetics. Procedia Comput. Sci. 18(0), 449–458 (2013)

    Article  Google Scholar 

  23. Kowitz, C., Hegland, M.: An Opticom Method for Computing Eigenpairs. In: Garcke, J., Pflüger D. (eds.) Sparse Grids and Applications—Munich 2012 SE. Lecture Notes in Computational Science and Engineering, vol. 97, pp. 239–253. Springer, Berlin (2014)

    Chapter  Google Scholar 

  24. Kowitz, C., Pflüger, D., Jenko, F., Hegland, M.: The combination technique for the initial value problem in linear gyrokinetics. In: Sparse Grids and Applications, Lecture Notes in Computational Science and Engineering, vol. 88, pp. 205–222. Springer, Heidelberg (2012)

    Google Scholar 

  25. Kowitz, C., Pflüger, D., Jenko, F., Hegland, M.: The combination technique for the initial value problem in linear gyrokinetics. In: Sparse Grids and Applications, pp. 205–222. Springer, Berlin (2013)

    Google Scholar 

  26. Larson, J.W., Hegland, M., Harding, B., Roberts, S., Stals, L., Rendell, A.P., Strazdins, P., Ali, M.M., Kowitz, C., Nobes, R., et al.: Fault-tolerant grid-based solvers: combining concepts from sparse grids and mapreduce. Procedia Comput. Sci. 18, 130–139 (2013)

    Article  Google Scholar 

  27. Merz, F.: Gyrokinetic simulation of multimode plasma turbulence. Ph.D. thesis (2009)

    Google Scholar 

  28. Mohr, B., Frings, W.: Jülich blue gene/p extreme scaling workshop 2009. Technical Report, Technical report FZJ-JSC-IB-2010-02. Online at http://juser.fz-juelich.de/record/8924/files/ib-2010-02.ps.gz (2010)

  29. Pflüger, D.: Spatially Adaptive Sparse Grids for High-Dimensional Problems. Verlag Dr. Hut, München (2010)

    Google Scholar 

  30. Pflüger, D., Bungartz, H.-J., Griebel, M., Jenko, F., Dannert, T., Heene, M., Parra Hinojosa, A., Kowitz, C., Zaspel, P.: EXAHD: an exa-scalable two-level sparse grid approach for higher-dimensional problems in plasma physics and beyond. In: Euro-Par 2014: Parallel Processing Workshops, pp. 565–576. Springer (2014)

    Google Scholar 

  31. Shahzad, F., Wittmann, M., Zeiser, T., Hager, G., Wellein, G.: An evaluation of different i/o techniques for checkpoint/restart. In: Proceedings of the 2013 IEEE 27th International Symposium on Parallel and Distributed Processing Workshops and PhD Forum, pp. 1708–1716. IEEE Computer Society, Silver Spring, MD (2013)

    Google Scholar 

  32. Snir, M., Wisniewski, R.W., Abraham, J.A., Adve, S.V., Bagchi, S., Balaji, P., Belak, J., Bose, P., Cappello, F., Carlson, B., et al.: Addressing failures in exascale computing. Int. J. High Perform. Comput. Appl. 28, 129–173 (2014)

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported (in part) by the German Research Foundation (DFG) through the Priority Programme 1648 “Software for Exascale Computing” (SPPEXA), along with the support of the Technische Universität München – Institute for Advanced Study, funded by the German Excellence Initiative (and the European Union Seventh Framework Programme under grant agreement n 291763). D. Pflüger further acknowledges the financial support of the DFG within the Cluster of Excellence in Simulation Technology (EXC 310/1), and A. Parra Hinojosa thanks the support of CONACYT, Mexico.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alfredo Parra Hinojosa .

Editor information

Editors and Affiliations

Appendix: GENE Parameter File

Appendix: GENE Parameter File

& box

kymin =    0.3

lv     =    3.00

lw    =    9.00

lx  =    4.16667

adapt_lx =  T

mu_grid_type =

        ’ clenshaw_curtis ’

/

&general

nonlinear  =    F

arakawa_zv =    T

arakawa_zv_order =    2

calc_dt  =  F

dt_max     =    7.39E −4

courant    =      1.0

beta       =    0.1E −02

debye2     =     0.0

collision_op  =  ’none ’

init_cond  =  ’ fb ’

hyp_z =     2.000

hyp_v =    0.5000

/

& geometry

magn_geometry =  ’ circular ’

q0       =     1.4

shat     =    0.8

trpeps   =    0.18

major_R  =     1.0

major_R  =     1.0

norm_flux_projection  =  F

/

&species

name   =  ’ ions ’

omn    =     2.0

omt    =     4.5

mass   =     1.0

temp   =     1.0

dens   =     1.0

charge =   1

/

&species

name   =  ’ electrons ’

omn    =     2.0

omt    =     3.5

mass   =    0.27E −03

temp   =     1.5

dens   =     1.0

charge =  −1

/

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Hinojosa, A.P., Kowitz, C., Heene, M., Pflüger, D., Bungartz, HJ. (2015). Towards a Fault-Tolerant, Scalable Implementation of GENE. In: Mehl, M., Bischoff, M., Schäfer, M. (eds) Recent Trends in Computational Engineering - CE2014. Lecture Notes in Computational Science and Engineering, vol 105. Springer, Cham. https://doi.org/10.1007/978-3-319-22997-3_3

Download citation

Publish with us

Policies and ethics